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ABSTRACT 


This  study  compares  the  strategies  of  code  reading,  functional  testing,  and  structur¬ 
al  testing  In  three  aspects  of  software  testing:  fault  detection  effectiveness,  fault  detec¬ 
tion  cost,  and  classes  of  faults  detected.  Thirty  two  professional  programmers  and  42 
advanced  students  applied  the  three  techniques  to  four  unit-sized  programs  In  a  frac¬ 
tional  factorial  experimental  design.  The  major  results  of  this  study  are  the  following. 
1)  With  the  professional  programmers,  code  reading  detected  more  software  faults  and 
had  a  higher  fault  detection  rate  than  did  functional  or  structural  testing,  while  func¬ 
tional  testing  detected  more  faults  than  did  structural  testing,  but  functional  and  struc¬ 
tural  testing  were  not  different  In  fault  detection  rate.  2)  In  one  advanced  student  sub¬ 
ject  group,  code  reading  and  functional  testing  were  not  different  In  faults  found,  but 
were  both  superior  to  structural  testing,  while  In  the  other  advanced  student  subject 
group  there  was  no  difference  among  the  techniques.  3)  With  the  advanced  student  sub¬ 
jects,  the  three  techniques  were  not  different  In  fault  detection  rate.  4)  Number  of 
faults  observed,  fault  detection  rate,  and  total  effort  In  detection  depended  on  the  type 
of  software  tested.  5)  Code  reading  detected  more  Interface  faults  than  did  the  other 
methods,  a)  Functional  testing  detected  more  control  faults  than  did  the  other 
methods.  7)  When  asked  to  estimate  the  percentage  of  faults  detected,  code  readers 
gave  the  most  accurate  estimates  while  functional  testers  gave  the  least  accurate  esti¬ 
mates. 
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1.  Introduction 

The  processes  of  software  testing  and  defect  detection  continue  to  challenge  the 
software  community.  Even  though  the  software  testing  and  defect  detection  activities 
are  Inexact  and  Inadequately  understood,  they  are  crucial  to  the  success  of  a  software 
project.  The  controlled  study  presented  addresses  the  uncertainty  of  how  to  test  soft¬ 
ware  effectively.  In  this  Investigation,  common  testing  techniques  were  applied  to 
different  types  of  software  by  subjects  that  had  a  wide  range  of  professional  experience. 
This  work  Is  Intended  to  characterize  how  testing  effectiveness  relates  to  several  factors: 
testing  technique,  software  type,  fault  type,  tester  experience,  and  any  Interactions 
among  these  factors.  This  examination  extends  previous  work  by  Incorporating  different 
testing  techniques  and  a  greater  number  of  persons  and  programs,  while  broadening  the 
scope  of  Issues  examined  and  adding  statistical  significance  to  the  conclusions. 

The  following  sections  describe  the  testing  techniques  examined,  the  Investigation 
goals,  the  experimental  design,  operation,  analysis,  and  conclusions. 

2.  Testing  Techniques 

To  demonstrate  that  a  particular  program  actually  meets  Its  specifications,  profes¬ 
sional  software  developers  currently  utilize  many  different  testing  methods.  Before 
presenting  the  goals  for  the  empirical  study  comparing  the  popular  techniques  of  code 
reading,  functional  testing,  and  structural  testing,  a  description  will  be  given  of  the  test¬ 
ing  strategies  and  their  different  capabilities  (see  Figure  1.).  In  functional  testing,  which 
Is  a  "black  box”  approach  [Howden  80],  a  programmer  constructs  test  data  from  the 
program's  specification  through  methods  such  as  equivalence  partitioning  and  boundary 
value  analysis  (Myers  79].  The  programmer  then  executes  the  program  and  contrasts  Its 
actual  behavior  with  that  Indicated  In  the  specification.  In  structural  testing,  which  Is  a 
"white  box”  approach  [Howden  78,  Howden  Si],  a  programmer  Inspects  the  source  code 
and  then  devises  and  executes  test  cases  based  on  the  percentage  of  the  program’s  state¬ 
ments  or  expressions  executed  (the  "test  set  coverage”)  [Stuckl  77].  The  structural  cov¬ 
erage  criteria  used  was  100%  statement  coverage.  In  code  reading  by  stepwise  abstrac¬ 
tion,  a  person  identifies  prime  subprograms  In  the  software,  determines  their  functions, 
and  composes  these  functions  to  determine  a  function  for  the  entire  program  [Mills  72. 


Linger,  Mills  &  Witt  70].  The  code  reader  then  compares  this  derived  function  and  the 
specifications  (the  Intended  function).  In  order  to  contrast  these  various  strategies,  an 
empirical  study  has  been  conducted  using  the  techniques  of  code  reading,  functional 
testing,  and  structural  testing. 

2.1.  Investigation  Goals 

The  goals  of  this  study  comprise  three  different  aspects  of  software  testing:  fault 
detection  effectiveness,  fault  detection  cost,  and  classes  of  faults  detected.  An  applica¬ 
tion  of  the  goal/questlon/metrlc  paradigm  [Baslll  &  Selby  84,  Baslll  &  Weiss  84]  leads  to 
the  framework  of  goals  and  questions  for  this  study  appearing  In  Figure  2. 

The  first  goal  area  Is  performance  oriented  and  Includes  a  natural  first  question 
(I.A):  which  of  the  techniques  detects  the  most  faults  In  the  programs?  The  comparison 
between  the  techniques  is  being  made  across  programs,  each  with  a  different  number  of 
faults.  An  alternate  Interpretation  would  then  be  to  compare  the  percentage  of  faults 
found  In  the  programs  (question  LA.1).  The  number  of  faults  that  a  technique  exposes 
should  also  be  compared;  that  Is,  faults  that  are  made  observable  but  not  necessarily  ob¬ 
served  and  reported  by  a  tester  (IA..2).  Because  of  the  differences  In  types  of  software 
I  and  In  testers'  abilities.  It  Is  relevant  to  determine  whether  the  number  of  faults  detect¬ 
ed  is  either  program  or  programmer  dependent  (I.B,  I.C).  Since  one  technique  may  find 
a  few  more  faults  than  another.  It  becomes  useful  to  know  how  much  effort  that  tech¬ 
nique  requires  (HA).  Awareness  of  what  types  of  software  require  more  effort  to  test 
(n.B)  and  what  types  of  programmer  backgrounds  require  less  effort  In  fault  uncovering 
(n.C)  is  also  quite  useful.  If  one  Is  Interested  In  detecting  certain  classes  of  faults,  such 
as  In  error-based  testing  (Foster  80,  Valdes  &  Goel  83],  It  Is  appropriate  to  apply  a  tech¬ 
nique  sensitive  to  that  particular  type  (m.A).  Classifying  the  types  of  faults  that  are 
observable  yet  go  unreported  could  help  focus  and  Increase  testing  effectiveness  (m.B). 

3.  Empirical  Study 

Admittedly,  the  goals  stated  here  are  quite  ambitious.  In  no  way  is  It  implied  that 
this  study  can  definitively  answer  all  of  these  questions  for  all  environments.  It  Is  In¬ 
tended,  however,  that  the  statistically  significant  analysis  presented  lends  Insights  Into 
their  answers  and  Into  the  merit  and  appropriateness  of  each  of  the  techniques.  Note 


that  this  study  compares  the  individual  application  of  the  three  testing  techniques  In 
order  to  Identify  their  distinct  advantages  and  disadvantages.  This  approach  Is  a  first 
step  toward  proposing  a  composite  testing  strategy,  which  possibly  Incorporates  several 
testing  methods.  The  following  sections  describe  the  empirical  study  undertaken  to  pur¬ 
sue  these  goals  and  questions.  Including  the  selection  of  subjects,  programs,  and  experi¬ 
mental  design,  and  the  overall  operation  of  the  study. 

3.1.  Iterative  Experimentation 

The  empirical  study  consisted  of  three  phases.  The  first  and  second  phases  of  the 
study  took  place  at  the  University  of  Maryland  In  the  Falls  of  1982  and  1983  respective¬ 
ly.  The  third  phase  took  place  at  Computer  Sciences  Corporation  (CSC  -  Silver  Spring, 
MD)  and  NASA  Goddard  Space  Flight  Center  (Greenbelt,  MD)  In  the  Fall  of  1984.  The 
sequential  experimentation  supported  the  Iterative  nature  of  the  learning  process,  and 
enabled  the  Initial  set  of  goals  and  questions  to  be  expanded  and  resolved  by  further 
analysis.  The  goals  were  further  refined  by  discussions  of  the  preliminary  results  [Selby 
83,  Selby  84].  These  three  phases  enabled  the  pursuit  of  result  reproducibility  across  en¬ 
vironments  having  subjects  with  a  wide  ran  go  of  experience. 

3.2.  Subject  and  Program/Fault  Selection 

A  primary  consideration  In  this  study  was  to  use  a  realistic  testing  environment  to 
assess  the  effectiveness  of  these  different  testing  strategies,  as  opposed  to  creating  a  best 
possible  testing  situation  [Hetzel  78],  Thus,  1)  the  subjects  for  the  study  were  chosen  to 
be  representative  of  different  levels  of  expertise,  2)  the  programs  tested  correspond  to 
different  types  of  software  and  reflect  common  programming  style,  and  3)  the  faults  In 
the  programs  were  representative  of  those  frequently  occurring  In  software.  Sampling 
the  subjects,  programs,  and  faults  in  this  manner  Is  Intended  to  evaluate  the  testing 
methods  reasonably,  and  to  facilitate  the  generalization  of  the  results  to  other  environ¬ 
ments. 


3.2.1.  Subjects 

The  three  phases  of  the  study  Incorporated  a  total  of  74  subjects:  the  Individual 
phases  had  29,  13,  and  32  subjects  respectively.  The  subjects  were  selected,  based  on 
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several  criteria,  to  be  representative  of  three  different  levels  of  computer  science  exper¬ 
tise:  advanced.  Intermediate,  and  Junior.  The  number  of  subjects  In  each  level  of  exper¬ 
tise  for  the  different  phases  appears  In  Figure  3. 

The  42  subjects  In  the  first  two  phases  of  the  study  were  the  members  of  the  upper 
level  “Software  Design  and  Development”  course  at  the  University  of  Maryland  In  the 
Falls  of  1982  and  1983.  The  Individuals  were  either  upper-level  computer  science  majors 
or  graduate  students;  some  were  working  part-time  and  all  were  In  good  academic 
standing.  The  topics  of  the  course  included  structured  programming  practices,  function¬ 
al  correctness,  top-down  design,  modular  specification  and  design,  step-wise  refinement, 
and  PDL,  In  addition  to  the  presentation  of  the  techniques  of  code  reading,  functional 
testing,  and  structural  testing.  The  references  for  the  testing  methods  were  [Mills  75, 
Fagan  76,  Myers  79,  Howden  80],  and  the  lectures  were  presented  by  V.  R.  Baslll  and  F. 
T.  Baker.  The  subjects  from  the  University  of  Maryland  spanned  the  Intermediate  and 
Junior  levels  of  computer  science  expertise.  The  assignment  of  Individuals  to  levels  of 
expertise  was  based  on  professional  experience  and  prior  academic  performance  In 
relevant  computer  science  courses.  The  Individuals  In  the  first  and  second  phases  had 
overall  averages  of  1.7  (SD  =  1.7)  and  1.5  (SD  —  1.5)  years  of  professional  experience. 
The  nine  Intermediate  subjects  In  the  first  phase  had  from  2.8  to  7  years  of  professional 
experience  (average  of  3.9  years,  SD  =  1.3),  and  the  four  in  the  second  phase  had  from 
2.3  to  5.5  years  of  professional  experience  (average  of  3.2,  SD  =  1.5).  The  twenty 
Junior  subjects  In  the  first  phases  and  the  nine  In  the  second  phase  both  had  from  0  to  2 
years  professional  experience  (averages  of  0.7,  SD  =  0.8,  and  0.8,  SD  =  0.8,  respective¬ 
ly). 

The  32  subjects  In  the  third  phase  of  the  study  were  programming  professionals 
from  NASA  and  Computer  Sciences  Corporation.  These  Individuals  were  mathemati¬ 
cians,  physicists,  and  engineers  that  develop  ground  support  software  for  satellites. 
They  were  familiar  with  all  three  testing  techniques,  but  had  used  functional  testing  pri¬ 
marily.  A  four  hour  tutorial  on  the  testing  techniques  was  conducted  for  the  subjects 
by  R.  W.  Selby.  This  group  of  subjects,  examined  In  the  third  phase  of  the  experiment, 
spanned  all  three  expertise  levels  and  had  an  overall  average  of  10.0  (SD  =  5.7)  years 
professional  experience.  Several  criteria  were  considered  In  the  assignment  of  subjects  to 
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expertise  levels.  Including  years  of  professional  experience,  degree  background,  and  their 
manager's  suggested  assignment.  The  eight  advanced  subjects  ranged  from  9.5  to  20.5 
years  professional  experience  (average  of  15.0,  SD  =  4.1).  The  eleven  Intermediate  sub¬ 
jects  ranged  from  3.5  to  17.5  years  experience  (average  of  10.9,  SD  =  4.9).  The  thirteen 
Junior  subjects  ranged  from  1.5  to  13.5  years  experience  (average  of  6.1,  SD  —  4.4). 

3.2.2.  Programs 

The  experimental  design  enables  the  distinction  of  the  testing  techniques  while  al¬ 
lowing  for  the  effects  of  the  different  programs  being  tested.  The  four  programs  used  In 
the  Investigation  were  chosen  to  be  representative  of  several  different  types  of  software. 
The  programs  were  selected  specially  for  the  study  and  were  provided  to  the  subjects  for 
testing;  the  subjects  did  not  test  programs  that  they  had  written.  All  programs  were 
written  In  a  high-level  language  with  which  the  subjects  were  familiar.  The  three  pro¬ 
grams  tested  In  the  CSC/NASA  phase  were  written  In  FORTRAN,  and  the  programs 
tested  In  the  University  of  Maryland  phases  were  written  In  the  Slmpl-T  structured  pro¬ 
gramming  language  [Baslll  8c  Turner  76).  1  The  four  programs  tested  were  P  t)  a  text 
processor,  P  s)  a  mathematical  plotting  routine,  P  3)  a  numeric  abstract  data  type,  and 
P  4)  a  database  malntalner.  The  programs  are  summarized  In  Figure  4.  There  exists 
some  differentiation  In  size,  and  the  programs  are  a  realistic  size  for  unit  testing.  Each 
of  the  subjects  tested  three  programs,  but  a  total  of  four  programs  was  used  across  the 
three  phases  of  the  study.  The  programs  tested  In  each  of  the  three  phases  of  the  study 
appear  In  Figure  5.  The  specifications  for  the  programs  appear  In  Appendix  A,  and 
their  source  code  appears  In  Appendix  B. 

The  first  program  Is  a  text  formatting  program,  which  also  appeared  in  [Myers  78]. 
A  version  of  this  program,  originally  written  by  [Naur  69]  using  techniques  of  program 
correctness  proofs,  was  analyzed  In  [Goodenough  8c  Gerhart  75].  The  second  program  Is 
a  mathematical  plotting  routine.  This  program  was  written  by  R.  W.  Selby,  based 
roughly  on  a  sample  program  In  [Jensen  8c  Wirth  74],  The  third  program  Is  a  numeric 

1  Slmpl-T  Is  a  structured  language  that  supports  several  string  and  file  handling 
primitives.  In  addition  to  the  usual  control  flow  constructs  available,  for  example,  In 
Pascal. 


data  abstraction  consisting  of  a  set  of  list  processing  utilities.  This  program  was  sub¬ 
mitted  for  a  class  project  by  a  member  of  an  Intermediate  level  programming  course  at 
the  University  of  Maryland.  [McMullln  &  Gannon  80].  The  fourth  program  Is  a  maln- 
talner  for  a  database  of  bibliographic  references.  This  program  was  analyzed  In  [Hetzel 
76],  and  was  written  by  a  systems  programmer  at  the  University  of  North  Carolina  com¬ 
putation  center. 

Note  that  the  source  code  for  the  programs  contains  no  comments.  This  creates  a 
worst-case  situation  for  the  code  readers.  In  an  environment  where  code  contained  help¬ 
ful  comments,  performance  of  code  readers  would  likely  Improve,  especially  If  the  source 
code  contained  as  comments  the  Intermediate  functions  of  the  program  segments.  In  an 
environment  where  the  comments  were  at  all  suspect,  they  could  then  be  Ignored. 

3.2.3.  Faults 

The  faults  contained  In  the  programs  tested  represent  a  reasonable  distribution  of 
faults  that  commonly  occur  In  software  [Weiss  &  Baslll  85,  Baslll  &  Perrlcone  84].  All 
the  faults  In  the  database  malntalner  and  the  numeric  abstract  data  type  were  made 
during  the  actual  development  of  the  programs.  The  other  two  programs  contain  a  mix 
of  faults  made  by  the  original  programmer  and  faults  seeded  In  the  code.  The  programs 
contained  a  total  of  34  faults;  the  text  formatter  had  nine,  the  plotting  routine  had  six, 
the  abstract  data  type  had  seven,  and  the  database  malntalner  had  twelve. 

3.2.3. 1.  Fault  Origin 

The  faults  In  the  text  formatter  were  preserved  from  the  article  In  which  It  ap¬ 
peared  [Myers  78],  except  for  some  of  the  more  controversial  ones  [Callllau  &  Rubin  79]. 
In  the  mathematical  plotter,  faults  made  during  program  translation  were  supplemented 
by  additional  representative  faults.  The  faults  In  the  abstract  data  type  were  the  origi¬ 
nal  ones  made  by  the  program’s  author  during  the  development  of  the  program.  The 
faults  In  the  database  malntalner  were  recorded  during  the  development  of  the  program, 
and  then  reinserted  Into  the  program.  The  next  section  describes  a  classification  of  the 
different  types  of  faults  In  the  programs.  Note  that  this  Investigation  of  the  fault 
detecting  ability  of  these  techniques  Involves  only  those  types  occurring  in  the  source 
code,  not  other  types  such  as  those  In  the  requirements  or  the  specifications. 


3. 2.3.2.  Fault  Classification 

The  faults  In  the  programs  are  classified  according  to  two  different  abstract 
classification  schemes  [Baslll  &,  Perrlcone  84].  One  fault  categorization  method  separates 
faults  of  omission  from  faults  of  commission.  Faults  of  commission  are  those  faults 
present  as  a  result  of  an  Incorrect  segment  of  existing  code.  For  example,  the  wrong  ar¬ 
ithmetic  operator  Is  used  for  a  computation  In  the  rlght-hand-slde  of  an  assignment 
statement.  Faults  of  omission  are  those  faults  present  as  a  result  of  a  programmer’s  for¬ 
getting  to  Include  some  entity  In  a  module.  For  example,  a  statement  Is  missing  from 
the  code  that  would  assign  the  proper  value  to  a  variable. 

A  second  fault  categorization  scheme  partitions  software  faults  Into  the  six  classes 
of  l)  Initialization,  2)  computation,  3)  control,  4)  Interface,  5)  data,  and  0)  cosmetic. 
Improperly  Initializing  a  data  structure  constitutes  an  initialization  fault.  For  example, 
assigning  a  variable  the  wrong  value  on  entry  to  a  module.  Computation  faults  are 
those  that  cause  a  calculation  to  evaluate  the  value  for  a  variable  Incorrectly.  The 
above  example  of  a  wrong  arithmetic  operator  In  the  rlght-hand-slde  of  an  assignment 
statement  would  be  a  computation  fault.  A  control  fault  causes  the  wrong  control  flow 
path  In  a  program  to  be  taken  for  some  Input.  An  Incorrect  predicate  in  an  EF-THEN- 
ELSE  statement  would  be  a  control  fault.  Interface  faults  result  when  a  module  uses 
and  makes  assumptions  about  entitles  outside  the  module's  local  environment.  Interface 
faults  would  be,  for  example,  passing  an  Incorrect  argument  to  a  procedure,  or  assuming 
in  a  module  that  an  array  passed  as  an  argument  was  filled  with  blanks  by  the  passing 
routine.  A  data  fault  are  those  that  result  from  the  Incorrect  use  of  a  data  structure. 
For  example.  Incorrectly  determining  the  Index  for  the  last  element  In  an  array.  Finally, 
cosmetic  faults  are  clerical  mistakes  when  entering  the  program.  A  spelling  mistake  In 
an  error  message  would  be  a  cosmetic  fault. 

Interpreting  and  classifying  faults  In  software  is  a  difficult  and  inexact  task.  The 
categorization  process  often  requires  trying  to  recreate  the  original  programmer's 
misunderstanding  of  the  problem  [Johnson.  Draper  &  Soloway  83].  The  above  two  fault 
classification  schemes  attempt  to  distinguish  among  different  reasons  that  programmers 
make  faults  In  software  development.  They  were  applied  to  the  faults  In  the  programs 


In  a  consistent  Interpretation;  It  Is  certainly  possible  that  another  analyst  could  have  In¬ 
terpreted  them  differently.  The  separate  application  of  each  of  the  two  classification 
schemes  to  the  faults  categorized  them  In  a  mutually  exclusive  and  exhaustive  manner. 
Figure  6  displays  the  distribution  of  faults  In  the  programs  according  to  these  schemes. 

3.2.3.3.  Fault  Description 

The  faults  In  the  programs  are  described  in  Figure  7.  There  have  been  various 
efforts  to  determine  a  precise  counting  scheme  for  "defects”  In  software  [Gloss- Soler  79, 
IEEE  83).  According  to  the  explanations  given,  a  software  "fault”  Is  a  specific  manifes¬ 
tation  In  the  source  code  of  a  programmer  "error.”  For  example,  due  to  a  misconception 
or  document  discrepancy,  a  programmer  commits  an  “error”  (In  his/her  head)  that  may 
result  In  more  than  one  “fault”  In  a  program.  Using  this  Interpretation,  software 
"faults”  reflect  the  correctness,  or  lack  thereof,  In  a  program.  The  entitles  examined  In 
this  analysis  are  software  faults. 

3.3.  Experimental  Design 

The  experimental  design  applied  for  each  of  the  three  phases  of  the  study  was  a 
fractional  factorial  design  [Cochran  &  Cox  50,  Box,  Hunter,  &  Hunter  78).  This  experi¬ 
mental  design  distinguishes  among  the  testing  techniques,  while  allowing  for  variation  In 
the  ability  of  the  particular  Individual  testing  or  In  the  program  being  tested.  Figure  8 
displays  the  fractional  factorial  design  appropriate  for  the  third  phase  of  the  study. 
Subject  St  Is  In  the  advanced  expertise  level,  and  he  structurally  tested  program  P  lt 
functionally  tested  program  P  #  and  code  read  program  P ^  Notice  that  all  of  the  sub¬ 
jects  tested  each  of  the  three  programs  and  used  each  of  the  three  techniques.  Of 
course,  no  one  tests  a  given  program  more  than  once.  The  design  appropriate  for  the 
third  phase  Is  discussed  In  the  following  paragraphs,  with  the  minor  differences  between 
this  design  and  the  ones  applied  In  the  first  two  phases  being  discussed  at  the  end  of  the 
section. 

3.3.1.  Independent  and  Dependent  Variables 

The  experimental  design  has  the  three  Independent  variables  of  testing  technique, 
software  type,  and  level  of  expertise.  For  the  design  appearing  In  Figure  8,  approprlat 
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for  the  third  phase  of  the  study,  the  three  main  effects  have  the  following  levels: 

1)  testing  technique:  code  reading,  functional  testing,  and  structural  testing 

2)  software  type:  (P^)  text  processing,  (P  3)  numeric  abstract  data  type,  and  (P^)  data¬ 

base  malntalner 

3)  level  of  expertise:  advanced.  Intermediate,  and  Junior 

Every  combination  of  these  levels  occurs  In  the  design.  That  Is,  programmers  In  all 
three  levels  of  expertise  applied  all  three  testing  techniques  on  all  programs.  In  addition 
to  these  three  main  effects,  a  factorial  analysis  of  variance  (ANOVA)  model  supports  the 
analysis  of  Interactions  among  each  of  these  main  effects.  Thus,  the  Interaction  effects 
of  testing  technique  *  software  type,  testing  technique  *  expertise  level,  software  type  * 
expertise  level,  and  the  three-way  Interaction  of  testing  technique  *  software  type  *  ex¬ 
pertise  level  are  Included  in  the  model.  There  are  several  dependent  variables  examined 
In  the  study,  Including  number  of  faults  detected,  percentage  of  faults  detected,  total 
fault  detection  time,  and  fault  detection  rate.  Observations  from  the  on-line  methods  of 
functional  and  structural  testing  also  had  as  dependent  variables  number  of  computer 
runs,  amount  of  cpu-tlme  consumed,  maximum  statement  coverage  achieved,  connect 
time  used,  number  of  faults  that  were  observable  from  the  test  data,  percentage  of 
faults  that  were  observable  from  the  test  data,  and  percentage  of  faults  observable  In 
the  from  the  test  data  that  were  actually  observed  by  the  .tester. 

3.3.2.  Analysis  of  Variance  Model 

The  three  main  effects  and  all  the  two-way  and  three-way  Interactions  effects  are 
called  fixed  effects  In  this  factorial  analysis  of  variance  model.  The  levels  of  these  effects 
given  above  represent  all  levels  of  Interest  In  the  Investigation.  For  example,  the  effect 
of  testing  technique  has  as  particular  levels  code  reading,  functional  testing,  and  struc¬ 
tural  testing;  these  particular  testing  techniques  are  the  only  ones  under  comparison  In 
this  study.  The  effect  of  the  particular  subjects  that  participated  In  this  study  requires 
a  little  different  Interpretation.  The  subjects  examined  In  the  study  were  random  sam¬ 
ples  of  programmers  from  the  large  population  of  programmers  at  each  of  the  levels  of 
expertise.  Thus,  the  effect  of  the  subjects  on  the  various  dependent  variables  Is  a  ran¬ 
dom  variable,  and  this  effect  therefore  is  called  a  random  effect.  If  the  samples  exam- 
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lned  are  truly  representative  of  the  population  of  subjects  at  each  expertise  level,  the 
Inferences  from  the  analysis  can  then  be  generalized  across  the  whole  population  of  sub¬ 
jects  at  each  expertise  level,  not  Just  across  the  particular  subjects  In  the  sample  chosen. 
Since  this  analysis  of  variance  model  contains  both  fixed  and  random  effects.  It  Is  called 
a  mixed  model.  The  actual  ANOVA  model  for  the  design  appearing  In  Figure  8  Is  given 
below. 

;«  =  /*  +  ai  +  Pj  +  'Ik  +  <5*/  +  aPij  +  a^ik  +  (h  jk  +  aPlijk  +  eijkl 

where 

T ijici  Is  the  observed  response  from  subject  1  of  experience  level  k  using  testing 
technique  1  on  program  J 

fi  Is  the  overall  mean  response 

a,-  Is  the  main  effect  of  testing  technique  1  (1  =*  1,2,3) 

0j  Is  the  main  effect  of  program  J  (J  =  1,  3,  4) 

lk  Is  the  main  effect  of  expertise  level  k  (k  =  1,  2,  3) 

6kl  Is  the  random  effect  of  subject  1  within  expertise  level  k,  a  random  variable  (1 
=  1,  2,  ....  32;  k  =  1,  2,  3) 

oc0ij  Is  the  Interaction  effect  of  testing  technique  1  with  program  J  (1  =  1,  2,  3;  J 
=  1,  3,  4) 

a7,*  Is  the  Interaction  effect  of  testing  technique  I  with  expertise  level  k  (1  =  1, 
2,  3;  k  ~  1,  2,  3) 

@~1jk  Is  the  Interaction  effect  of  program  J  with  expertise  level  k  (J  =  l,  3,  4;  k  = 
1,  2,  3) 

oi&liik  Is  the  Interaction  effect  of  testing  technique  1  with  program  J  with  experi¬ 
ence  level  k  (1  =  1,  2,  3;  J  =  1,  3,  4;  k  =  1,  2,  3) 

Cijhi  Is  the  experimental  error  for  each  observation,  a  random  variable 


The  F  tests  of  hypotheses  on  all  the  fixed  effects  mentioned  above  use  the  error 
(residual)  mean  square  In  the  denominator,  except  for  the  test  of  the  expertise  level 
effect.  The  expected  mean  square  for  the  expertise  level  effect  contains  a  component  for 
the  actual  variance  of  subjects  within  expertise  level.  In  order  to  select  the  appropriate 
error  term  for  the  denominator  of  the  expertise  level  F  test,  the  mean  square  for  the 
effect  of  subjects  nested  within  expertise  level  Is  chosen.  The  parameters  for  the  random 
effect  of  subjects  within  expertise  level  are  assumed  to  be  drawn  from  a  normally  distri¬ 
buted  random  process  with  mean  zero  and  common  variance.  The  experimental  error 
terms  are  assumed  to  have  mean  zero  and  common  variance. 

The  fractional  factorial  design  applied  In  the  first  two  phases  of  the  analysis 
differed  slightly  from  the  one  presented  above  for  the  third  phase.2  In  the  third  phase  of 
the  study,  programs  P  lf  P  s,  and  P  ^  were  tested  by  subjects  In  three  levels  of  expertise. 
In  both  phases  one  and  two,  there  were  only  subjects  from  the  levels  of  Intermediate 
and  Junior  expertise.  In  phase  one,  programs  P  lt  P  s,  and  P  2  were  tested.  In  phase 
two,  the  programs  tested  were  P  lf  P  &  and  P  The  only  modifications  necessary  to  the 
above  explanation  for  phases  one  and  two  are  1)  eliminating  the  advanced  expertise  lev¬ 
el,  2)  changing  the  program  P  subscripts  appropriately,  and  3)  leaving  out  the  three  way 
Interaction  term  In  phase  two,  because  of  the  reduced  number  of  subjects.  In  all  three 
of  the  phases,  all  subjects  used  each  of  the  three  techniques  and  tested  each  of  the  three 
programs  for  that  phase.  Also,  within  all  three  phases,  all  possible  combinations  of  ex¬ 
pertise  level,  testing  techniques,  and  programs  occurred. 

The  order  of  presentation  of  the  testing  techniques  was  randomized  among  the  sub¬ 
jects  In  each  level  of  expertise  In  each  phase  of  the  study.  However,  the  Integrity  of  the 
results  would  have  suffered  lf  each  of  the  programs  In  a  given  phase  was  tested  at 
different  times  by  different  subjects.  Note  that  each  of  the  testing  sessions  took  place 
on  a  different  day  because  of  the  amount  of  effort  required.  If  different  programs  would 
have  been  tested  on  different  days,  any  discussion  about  the  programs  among  subjects 

2  Although  the  data  from  all  the  phases  can  be  analyzed  together,  the  number  of 
empty  cells  resulting  from  not  having  all  three  experience  levels  and  all  four  programs  In 
all  phases  limits  the  number  of  parameters  that  can  be  estimated  and  causes  non-unique 
Type  IV  partial  sums  of  squares. 
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between  testing  sessions  would  have  affected  the  future  performance  of  others.  There¬ 
fore,  all  subjects  In  a  phase  tested  the  same  program  on  the  same  day.  The  actual  order 
of  program  presentation  was  the  order  In  which  the  programs  are  listed  In  the  previous 
paragraph. 

3.4.  Experimental  Operation 

Each  of  the  three  phases  were  broken  Into  five  distinct  pieces:  training,  three  test¬ 
ing  sessions,  and  a  follow-up  session.  Ail  groups  of  subjects  were  exposed  to  a  similar 
amount  of  training  on  the  testing  techniques  before  the  study  began.  As  mentioned  ear¬ 
lier,  the  University  of  Maryland  subjects  were  enrolled  In  the  "Software  Design  and  De¬ 
velopment”  course,  and  the  NASA/CSC  subjects  were  given  a  four-hour  tutorial.  Back¬ 
ground  Information  on  the  subjects  was  captured  through  a  questionnaire.  Elementary 
exercises  followed  by  a  pretest  covering  all  techniques  were  administered  to  all  subjects 
after  the  training  and  before  the  testing  sessions.  Reasonable  effort  on  the  part  of  the 
University  of  Maryland  subjects  was  enforced  by  their  being  graded  on  the  work  and  by 
their  needing  to  use  the  techniques  In  a  major  class  project.  Reasonable  effort  on  the 
part  of  the  NASA/CSC  subjects  was  certain  because  of  their  desire  Tor  the  study’s  out¬ 
come  to  Improve  their  software  testing  environment.  All  subjects  groups  were  Judged 
highly  motivated  during  the  study.  The  subjects  were  all  familiar  with  the  editors,  ter¬ 
minals,  machines,  and  the  programs'  implementation  language. 

The  Individuals  were  requested  to  use  the  three  testing  techniques  to  the  best  of 
their  ability.  Every  subject  participated  In  all  three  testing  sessions  of  his/her  phase, 
using  all  techniques  but  each  on  a  separate  program.  The  Individuals  using  code  read¬ 
ing  were  each  given  the  specification  for  the  program  and  Its  source  code.  They  were 
then  asked  to  apply  the  methods  of  code  reading  by  stepwise  abstraction  to  detect 
discrepancies  between  the  program’s  abstracted  function  and  the  specification.  The 
functional  testers  were  each  given  a  specification  and  the  ability  to  execute  the  program. 
They  were  asked  to  perform  equivalence  partitioning  and  boundary  value  analysis  to 
select  a  set  of  test  data  for  the  program.  Then  they  executed  the  program  on  this  col¬ 
lection  of  test  data,  and  Inconsistencies  between  what  the  program  actually  performed 
and  what  they  though  the  specification  said  It  should  perform  were  noted.  The  struc- 
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tural  testers  were  given  the  source  code  for  the  program,  the  ability  to  execute  It,  and  a 
description  of  the  Input  format  for  the  program.  The  structural  testers  were  asked  to 
examine  the  source  and  generate  a  set  of  test  cases  that  cumulatively  execute  100%  of 
the  program’s  statements.  When  the  subjects  were  applying  an  on-line  technique,  they 
generated  and  executed  their  own  test  data;  no  test  data  sets  were  provided.  The  pro¬ 
grams  were  Invoked  through  a  test  driver  that  supported  the  use  the  of  multiple  Input 
data  sets.  This  test  driver,  unbeknown  to  the  subjects,  drained  off  the  Input  cases  sub¬ 
mitted  to  the  program  for  the  experimenter’s  later  analysis;  the  programs  could  only  be 
accessed  through  a  test  driver. 

A  structural  coverage  tool  calculated  the  actual  statement  coverage  of  the  test  set 
and  which  statements  were  left  unexecuted  for  the  structural  testers.  After  the  struc¬ 
tural  testers  generated  a  collection  of  test  data  that  met  (or  almost  met)  the  100%  cov¬ 
erage  criteria,  no  further  execution  of  the  program  or  reference  to  the  source  code  was 

« 

allowed.  They  retained  the  program's  output  from  the  test  cases  they  had  generated. 
These  testers  were  then  provided  with  the  program’s  specification.  Now  that  they  knew 
what  the  program  was  Intended  to  do,  they  were  asked  to  contrast  the  program’s 
specification  with  the  behavior  of  the  program  on  the  test  data  they  derived.  This 
scenario  for  the  structural  testers  was  necessary  so  that  "observed”  faults  could  be  com¬ 
pared. 

At  the  end  of  each  of  the  testing  sessions,  the  subjects  were  asked  to  give  a  reason¬ 
able  estimate  of  the  amount  of  time  spent  detecting  faults  with  a  given  testing  tech¬ 
nique.  The  University  of  Maryland  subjects  were  assured  that  this  had  nothing  to  with 
the  grading  of  the  work.  There  seemed  to  be  little  Incentive  for  the  subjects  In  any  of 
the  groups  not  to  be  truthful.  At  the  completion  of  each  testing  session,  the 
NASA/CSC  subjects  were  also  asked  what  percentage  of  the  faults  In  the  program  that 
they  thought  were  uncovered.  After  all  three  testing  sessions  In  a  given  phase  were 
completed,  the  subjects  were  requested  to  critique  and  evaluate  the  three  testing  tech¬ 
niques  regarding  their  understandablllty,  naturalness,  and  effectiveness.  The  University 
of  Maryland  subjects  submitted  a  written  critique,  while  a  two  hour  debriefing  forum 
was  conducted  for  the  NASA/CSC  Individuals.  In  addition  to  obtaining  the  Impressions 
of  the  Individuals,  these  follow-up  procedures  gave  an  understanding  of  how  well  the 


subjects  were  comprehending  and  applying  the  methods.  These  final  sessions  also 
afforded  the  participants  an  opportunity  to  comment  on  any  particular  problems  they 
had  with  the  techniques  or  In  applying  them  to  the  given  programs. 

4.  Data  Analysis 

The  analysis  of  the  data  collected  from  the  various  phases  of  the  experiment  Is 
presented  according  to  the  goal  and  question  framework  discussed  earlier. 

4.1.  Fault  Detection  Effectiveness 

The  first  goal  area  addresses  the  fault  detection  effectiveness  of  each  of  the  tech¬ 
niques.  Figure  9  presents  a  summary  of  the  measures  that  were  examined  to  pursue  this 
goal  area.  A  brief  description  of  each  measure  Is  as  follows  -  (*)  means  only  relevant  for 
on-line  testing,  a)  #  Faults  detected  -  the  number  of  faults  detected  by  a  subject  ap¬ 
plying  a  given  testing  technique  on  a  given  program,  b)  %  Faults  detected  -  the  per¬ 
centage  of  a  program's  faults  that  a  subject  detected  by  applying  a  testing  technique  to 
the  program,  c)  #  Faults  observable  (*)  -  the  number  of  faults  that  were  observable 
from  the  program's  behavior  given  the  Input  data  submitted,  d)  %  Faults  observable 
(*)  -  the  percentage  of  a  program’s  faults  that  were  observable  from  the  program’s 
behavior  given  the  Input  data  submitted,  e)  %  Detected/observable  (*)  -  the  percen¬ 
tage  of  faults  observable  from  the  program’s  behavior  on  the  given  Input  set  that  were 
actually  observed  by  a  subject,  f)  %  Faults  felt  found  -  a  subject’s  estimate  of  the  per¬ 
centage  of  a  program’s  faults  that  he/she  thought  were  detected  by  his/her  testing,  g) 
Maximum  statement  coverage  (*)  -  the  maximum  percentage  of  a  program’s  statements 
that  were  executed  in  a  set  of  test  cases. 

4.1.1.  Data  Distributions 

The  actual  distribution  of  the  number  of  faults  observed  by  the  subjects  appears  In 
Figure  10,  broken  down  by  phase.  From  Figures  9  and  10,  the  large  variation  In  perfor¬ 
mance  among  the  subjects  Is  clearly  seen.  The  mean  number  of  faults  detected  by  the 
subjects  Is  displayed  in  Figure  11,  broken  down  by  technique,  program,  expertise  level, 
and  phase. 
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4.1.2.  Number  of  Faults  Detected 


The  first  question  under  this  goal  area  asks  which  of  the  testing  techniques  detected 
the  most  faults  in  the  programs.  The  overall  F-test  of  the  techniques  detecting  an  equal 
number  of  faults  In  the  programs  Is  rejected  In  the  first  and  third  phases  of  the  study 
(a <.024  and  a<.0001,  respectively;  not  rejected  In  phase  two,  a>.05).  Recall  that  the 
phase  three  data  was  collected  from  32  NASA/CSC  subjects,  and  the  phase  one  data 
was  from  29  University  of  Maryland  subjects.  With  the  phase  three  data,  the  contrast 
of  "reading  -  0.5  *  (functional  +  structural)*’  estimates  that  the  technique  of  code  read¬ 
ing  by  stepwise  abstraction  detected  1.24  more  faults  per  program  than  did  either  of  the 
other  techniques  (a<.0001,  c.l.  0.73  -  1.75).3  Note  that  code  reading  performed  well 
even  though  the  professional  subjects’  primary  experience  was  with  functional  testing. 
Also  with  the  phase  three  data,  the  contrast  of  “functional  -  structural”  estimates  that 
the  technique  of  functional  testing  detected  1.11  more  faults  per  program  than  did 
structural  testing  (a<.0007,  c.l.  0.52  -  1.70).  In  the  phase  one  data,  the  contrast  of 
"0.5  *  (reading  4*  functional)  -  structural"  estimates  that  the  technique  of  structural 
testing  detected  1.00  fault  less  per  program  than  did  either  reading  or  functional  testing 
(a<.OO05,  c.l.  0.31  -  1.09).  In  the  phase  one  data,  the  contrast  of  "reading  -  function¬ 
al”  was  not  statistically  different  from  zero  (a>.05).  The  poor  performance  of  structur¬ 
al  testing  across  the  phases  suggests  the  inadequacy  of  using  statement  coverage  criteria. 
The  above  pairs  of  contrasts  were  chosen  because  they  are  linearly  Independent. 

4.1.3.  Percentage  of  Faults  Detected 

Since  the  programs  tested  each  had  a  different  number  of  faults,  a  question  In  the 
earlier  goal/question  framework  asks  which  technique  detected  the  greatest  percentage 
of  faults  in  the  programs.  The  order  of  performance  of  the  techniques  Is  the  same  as 
above  when  the  percentage  of  the  programs’  faults  detected  are  compared.  The  overall 
F-tests  for  phases  one  and  three  were  rejected  as  before  (a<.037  and  a<.000l  respec¬ 
tively;  not  rejected  In  phase  two,  a >.05).  Applying  the  same  contrasts  as  above:  a)  In 
phase  three,  reading  detected  10.0%  more  faults  per  program  than  did  the  other  tech- 

3  The  probably  of  Type  I  error  Is  reported,  the  probability  of  erroneously  rejecting 
the  null  hypothesis.  The  abbreviation  "c.l."  stands  for  95%  confidence  Interval. 


nlques  (a<.0001,  c.l.  9.0  -  22.1),  and  functional  detected  11.2%  more  faults  than  did 
structural  ( a<  003 ,  c.l.  4.1  -  18.3);  b)  In  phase  one,  structural  detected  13.2%  fewer  of 
a  program’s  faults  than  did  the  other  methods  (ac.Oll,  c.l.  3.5  -  22.9),  and  reading  and 
functional  were  not  statistically  different  as  before. 

4.1.4.  Dependence  on  Software  Type 

Another  question  In  this  goal  area  queries  whether  the  number  or  percentage  of 
faults  detected  depends  on  the  program  being  tested.  The  overall  F-test  that  the 
number  of  faults  detected  Is  not  program  dependent  is  rejected  only  In  the  phase  three 
data  (aC.OOOl).  Applying  Tukey’s  multiple  comparison  on  the  phase  three  data  reveals 
that  the  most  faults  were  detected  In  the  abstract  data  type,  the  second  most  In  the 
text  formatter,  and  the  least  number  of  faults  were  found  In  the  database  malntalner 
(simultaneous  a<.05).  When  the  percentage  of  faults  found  In  a  program  Is  considered, 
however,  the  overall  F-tests  for  the  three  phases  are  all  rejected  (a<.027,  a <.01,  and 
a  <  0001  In  respective  order).  Tukey's  multiple  comparison  yields  the  following  order¬ 
ings  on  the  programs  (all  simultaneous  a<.05).  In  the  phase  one  data,  the  ordering  was 
(data  type  =*  plotter)  >  text  formatter;  that  Is,  a  higher  percentage  of  faults  were 
detected  In  either  the  abstract  data  type  or  the  plotter  than  were  found  In  the  text  for¬ 
matter;  there  was  no  difference  between  the  abstract  data  type  and  the  plotter  In  the 
percentage  found.  In  the  phase  two  data,  the  ordering  of  percentage  of  faults  detected 
was  plotter  >  (text  formatter  database  malntalner).  In  the  phase  three  data,  the 
ordering  of  percentage  of  faults  found  In  the  programs  was  the  same  as  the  number  of 
faults  found,  abstract  data  type  >  text  formatter  >  database  malntalner.  Summariz¬ 
ing  the  effect  of  the  type  of  software  on  the  percentage  of  faults  observed:  1)  the  pro¬ 
grams  with  the  highest  percentage  of  their  faults  detected  were  the  abstract  data  type 
and  the  mathematical  plotter,  the  percentage  detected  between  these  two  was  not  sta¬ 
tistically  different;  2)  the  programs  with  the  lowest  percentage  of  their  faults  detected 
were  the  text  formatter  and  the  database  malntalner;  the  percentage  detected  between 
these  two  was  not  statistically  different  In  the  phase  two  data,  but  a  higher  percentage 
of  faults  in  the  text  formatter  was  detected  In  the  phase  three  data. 


4.1.5.  Observable  vs.  Observed  Faults 

One  evaluation  criteria  of  the  success  of  a  software  testing  session  Is  the  number  of 
faults  detected.  An  evaluation  criteria  of  the  particular  test  data  generated,  however.  Is 
the  ability  of  the  test  data  to  reveal  faults  In  the  program.  A  test  data  set's  ability  to 
uncover  faults  In  a  program  can  be  measured  by  the  number  or  percentage  of  a 
program's  faults  that  are  made  observable  from  execution  on  that  Input.  Distinguishing 
the  faults  observable  In  a  program  from  the  faults  actually  observed  by  a  tester 
highlights  the  differences  In  the  activities  of  test  data  generation  and  program  behavior 
examination.  As  shown  In  Figure  8,  the  average  number  of  the  programs’  faults  observ¬ 
able  was  68.0%  when  Individuals  were  either  functional  testing  or  structurally  testing. 
Of  course,  with  a  nonexecutlon-based  technique  such  as  code  reading,  100%  of  the  faults 
are  observable.  Test  data  generated  by  subjects  using  the  technique  of  functional  test¬ 
ing  resulted  in  1.4  more  observable  faults  (£*<.0002,  c.l.  0.79  -  2.01)  than  did  the  use  of 
structural  testing  In  phase  one  of  the  study;  the  percentage  difference  of  functional  over 
structural  was  estimated  at  20.0%  (a<.0002,  c.l.  11.2  -  28.8).  The  techniques  did  not 
differ  In  these  two  measures  In  the  third  phase  of  the  study.  However,  Just  considering 
the  faults  that  were  observable  from  the  submitted  test  data,  functional  testers  detected 
18.5%  more  of  these  observable  faults  than  did  structural  testers  In  the  phase  three  data 
(a< .0018,  c.l.  8.9  -  28.1);  they  did  not  differ  In  the  phase  one  data.  Note  that  all  faults 
In  the  programs  could  be  observed  In  the  programs’  output  given  the  proper  Input  data. 
When  using  the  on-line  techniques  of  functional  and  structural  testing,  subjects  detected 
70.3%  of  the  faults  observable  In  the  program’s  output.  In  order  to  conduct  a  successful 
testing  session,  faults  In  a  pro~ram  must  be  both  revealed  and  subsequently  observed. 

4.1.6.  Dependence  on  Program  Coverage 

Another  measure  of  the  ability  of  a  test  set  to  reveal  a  program's  faults  Is  the  per¬ 
centage  of  a  program’s  statements  that  are  executed  by  the  test  set.  The  average  max¬ 
imum  statement  coverage  achieved  by  the  functional  and  structural  testers  was  97.0%. 
The  maximum  statement  coverage  from  the  submitted  test  data  was  not  statistically 
different  between  the  functional  and  structural  testers  (a>,05).  .Also,  there  was  no 
correlation  between  maximum  statement  coverage  achieved  and  either  number  or  per- 
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centage  of  faults  found  (a >.05). 


4.1.7.  Dependence  on  Programmer  Expertise 

A  Anal  question  In  this  goal  area  concerns  the  contribution  of  programmer  expertise 
to  fault  detection  effectiveness.  In  the  phase  three  data  from  the  NASA/CSC  profes¬ 
sional  environment,  subjects  of  advanced  expertise  detected  more  faults  than  did  either 
the  subjects  of  Intermediate  or  Junior  expertise  (a<.05).  When  the  percentage  of  faults 
detected  Is  compared,  however,  the  advanced  subjects  performed  better  than  the  Junior 
subjects  (a<.05),  but  were  not  statistically  different  from  the  Intermediate  subjects 
( a>.05 ).  The  Intermediate  and  Junior  subjects  were  not  statistically  different  in  any  of 
the  three  phases  of  the  study  In  terms  of  number  or  percentage  faults  observed.  When 
several  subject  background  attributes  were  correlated  with  the  number  of  faults  found, 
total  years  of  professional  experience  had  a  minor  relationship  (Pearson  R  =  .22, 
a<.05).  Correspondence  of  performance  with  background  aspects  was  examined  across 
all  observations,  and  within  each  of  the  phases.  Including  previous  academic  perfor¬ 
mance  for  the  University  of  Maryland  subjects.  Other  than  the  above,  no  relationships 
were  found. 

4.1.8.  Accuracy  of  Self-Estimates 

Recall  that  the  NASA/CSC  subjects  In  the  phase  three  data  estimated,  at  the  com¬ 
pletion  of  a  testing  session,  the  percentage  of  a  program's  faults  they  thought  they  had 
uncovered.  This  estimation  of  the  number  of  faults  uncovered  correlated  reasonably 
well  with  the  actual  percentage  of  faults  detected  (R  ==  .57,  aC.OOOl).  Investigating 
further.  Individuals  using  the  different  techniques  were  able  to  give  better  estimates: 
code  readers  gave  the  best  estimates  (R  =  .79,  a<.0001),  structural  testers  gave  the 
second  best  estimates  (R  =  .57,  a<.0007),  and  functional  testers  gave  the  worst  esti¬ 
mates  (no  correlation,  a >.05).  This  last  observation  suggests  that  the  code  readers 
were  more  certain  of  the  effectiveness  they  had  In  revealing  faults  In  the  programs. 

4.1.9.  Dependence  on  Interactions 

There  were  few  significant  interactions  between  the  main  effects  of  testing  tech¬ 
nique,  program,  and  expertise  level.  In  the  phase  two  data,  there  was  an  Interaction 
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between  testing  technique  and  program  In  both  the  number  and  percentage  of  faults 
found  (a<.00i3,  a<.00l4  respectively).  The  effectiveness  of  code  reading  Increased  on 
the  text  formatter.  In  the  phase  three  data,  there  was  a  slight  three-way  Interaction 
between  testing  technique,  program,  and  expertise  level  for  both  the  number  and  per¬ 
centage  of  faults  found  (a<.05,  a<.04  respectively). 

4.1.10.  Summary  of  Fault  Detection  Effectiveness 

Summarizing  the  major  results  of  the  comparison  of  fault  detection  effectiveness:  1) 
In  the  phase  three  data,  code  reading  detected  a  greater  number  and  percentage  of 
faults  than  the  other  methods,  with  functional  detecting  more  than  structural;  2)  In  the 
phase  one  data,  code  reading  and  functional  were  equally  effective,  while  structural  was 
Inferior  to  both  -  there  were  no  differences  among  the  three  techniques  In  phase  two;  3) 
the  number  of  faults  observed  depends  on  the  type  of  software:  the  most  faults  were 
detected  In  the  abstract  data  type  and  the  mathematical  plotter,  the  second  most  In  the 
text  formatter,  and  (In  the  case  of  the  phase  three  data)  the  least  were  found  in  the  da¬ 
tabase  malntalner;  4)  functionally  generated  test  data  revealed  more  observable  faults 
than  did  structurally  generated  test  data  In  phase  one,  but  not  In  phase  three;  5)  sub¬ 
jects  of  Intermediate  and  Junior  expertise  were  equally  effective  In  detecting  faults,  while 
advanced  subjects  found  a  greater  number  of  faults  than  did  either  group;  and  6)  self¬ 
estimates  of  faults  detected  were  most  accurate  from  subjects  applying  code  reading,  fol¬ 
lowed  by  those  doing  structural  testing,  with  estimates  from  persons  functionally  testing 
having  no  relationship. 

4.2.  Fault  Detection  Cost 

The  second  goal  area  examines  the  fault  detection  cost  of  each  of  the  techniques. 
Figure  12  presents  a  summary  of  the  measures  that  were  examined  to  Investigate  this 
goal  area.  A  brief  description  of  each  measure  Is  as  follows  -  (*)  means  only  relevant  for 
on-line  testing,  a)  #  Faults  /  hour  -  the  number  of  faults  detected  by  a  subject  apply¬ 
ing  a  given  technique  normalized  by  the  effort  In  hours  required,  called  the  fault  detec¬ 
tion  rate,  b)  Detection  time  -  the  total  number  of  hours  that  a  subject  spent  In  testing 
a  program  using  a  technique,  c)  Cpu-tlme  (*)  -  the  cpu-tlme  In  seconds  used  during  the 
testing  session,  d)  Normalized  cpu-tlme  (*)  -  the  cpu-tlme  In  seconds  used  during  the 


testing  session,  normalized  by  a  factor  for  machine  speed.4  e)  Connect  time  (*)  -  the 
number  of  minutes  that  a  Individual  spent  on-line  while  testing  a  program,  f)  #  Pro¬ 
gram  runs  (*)  -  the  number  of  executions  of  the  program  test  driver;  note  that  the 
driver  supported  multiple  sets  of  Input  data.  All  of  the  on-line  statistics  were  monitored 
by  the  operating  systems  of  the  machines. 

4.2.1.  Data  Distributions 

The  actual  distribution  of  the  fault  detection  rates  for  the  subjects  appears  in  Fig¬ 
ure  13,  broken  down  by  phase.  Once  again,  note  the  many-to-one  differential  In  subject 
performance.  Figure  14  displays  the  mean  fault  detection  rate  for  the  subjects,  broken 
down  by  technique,  program,  expertise  level,  and  phase. 

4.2.2.  Fault  Detection  Rate  and  Total  Time 

The  first  question  In  this  goal  area  asks  which  testing  technique  had  the  highest 
fault  detection  rate.  The  overall  F-test  of  the  techniques’  having  the  same  fault  detec¬ 
tion  rate  was  rejected  In  the  phase  three  data  (a <.0014),  but  not  In  the  other  two 
phases  (a>.05).  As  before,  the  two  contrasts  of  "reading  -  0.5  *  (functional  +  structur¬ 
al)"  and  "functional  -  structural”  were  examined  to  detect  differences  among  the  tech¬ 
niques.  The  technique  of  code  reading  was  estimated  at  detecting  1.49  more  faults  per 
hour  than  did  the  other  techniques  In  the  phase  three  data  (a<  .0003,  c.l.  0.75  -  2.23). 
The  techniques  of  functional  and  structural  testing  were  not  statistically  different 
(a>  05).  Comparing  the  total  time  spent  In  fault  detection,  the  techniques  were  not 
statistically  different  In  the  phase  two  and  three  data;  the  overall  F-test  for  the  phase 
one  data  was  rejected  (a<.013).  In  the  phase  one  data,  structural  testers  spent  an  es¬ 
timated  1.08  hours  less  testing  than  did  the  other  techniques  (a<.004,  c.l.  0.39  -  1.78), 
while  code  readers  were  not  statistically  different  from  functional  testers.  Recall  that  in 
phase  one,  the  structural  testers  observed  both  a  lower  number  and  percentage  of  the 
programs’  faults  than  did  the  other  techniques. 

4  In  the  phase  three  data,  testing  was  done  on  both  a  VAX  11/780  and  an  IBM 
4341.  .As  suggested  by  benchmark  comparisons  [Church  84],  the  VAX  cpu-tlmes  were 
divided  by  1.8  and  the  IBM  cpu-tlmes  were  divided  by  0.9. 
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4.2.3.  Dependence  on  Software  Type 

Another  question  In  this  area  focuses  on  how  fault  detection  rate  depends  on  soft¬ 
ware  type.  The  overall  F-test  that  the  detection  rate  Is  the  same  for  the  programs  Is  re¬ 
jected  In  the  phase  one  and  phase  three  data  (aC.Ol  and  aC.OOOl  respectively);  the 
detection  rate  among  the  programs  was  not  statistically  different  In  phase  two.  Apply¬ 
ing  Tukey’s  multiple  comparisons  on  the  phase  one  data  finds  that  the  fault  detection 
rate  was  greater  on  the  abstract  data  type  than  on  the  plotter,  while  there  was  no 
difference  either  between  the  abstract  data  type  and  the  text  formatter  or  between  the 
text  formatter  and  the  plotter  (simultaneous  a<.05).  In  the  phase  three  data,  the  fault 
detection  rate  was  higher  In  the  abstract  data  type  than  It  was  for  the  text  formatter 
and  the  database  malntalner,  with  the  text  formatter  and  the  database  malntalner  not 
being  statistically  different  (simultaneous  a<.05).  The  overall  effort  spent  In  fault 
detection  was  different  among  the  programs  In  phases  one  and  three  (a<.Ol2  and 
a <.0001  respectively),  while  there  was  no  difference  In  phase  two.  In  phase  one,  more 
effort  was  spent  testing  the  plotter  than  the  abstract  data  type,  while  there  was  no  sta¬ 
tistical  difference  either  between  the  plotter  and  the  text  formatter  or  between  the  text 
formatter  and  the  abstract  data  type  (simultaneous  a<.05).  In  phase  three,  more  time 
was  spent  testing  the  database  malntalner  than  was  spent  on  either  the  text  formatter 
or  on  the  abstract  data  type,  with  the  text  formatter  not  differing  from  the  abstract 
data  type  (simultaneous  a<.05).  Summarizing  the  dependence  of  fault  detection  cost 
on  software  type,  1)  the  abstract  data  type  had  a  higher  detection  rate  and  less  total 
detection  effort  than  did  either  the  plotter  or  the  database  malntalner,  the  latter  two 
were  not  different  In  either  detection  rate  or  total  detection  time;  2)  the  text  formatter 
and  the  plotter  did  not  differ  In  fault  detection  rate  or  total  detection  effort;  3)  the  text 
formatter  and  the  database  malntalner  did  not  differ  In  fault  detection  rate  overall  and 
did  not  differ  In  total  detection  effort  In  phase  two,  but  the  database  malntalner  had  a 
higher  total  detection  effort  In  phase  three;  4)  the  text  formatter  and  the  abstract  data 
type  did  not  differ  In  total  detection  effort  overall  and  did  not  differ  In  fault  detection 
rate  lu  phase  one,  but  the  abstract  data  type  had  a  higher  detection  rate  In  phase  three. 
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4.2.4.  Computer  Costa 

In  addition  to  the  effort  spent  by  Individuals  In  software  testing,  on-line  methods 
Incur  machine  costs.  The  machine  cost  measures  of  cpu-tlme,  connect  time,  and  the 
number  of  runs  were  compared  across  the  on-line  techniques  of  functional  and  structural 
testing  In  phase  three  of  the  study.  A  nonexecutlon-based  technique  such  as  code  read¬ 
ing,  of  course,  Incurs  no  machine  time  costs.  When  the  machine  speeds  are  normalized 
(see  measure  definitions  above),  the  technique  of  functional  testing  used  26.0  more 
seconds  of  cpu-tlme  than  did  the  technique  of  structural  testing  (a<.0i6,  c.I.  7.0  - 
45.0).  The  estimate  of  the  difference  Is  29.8  seconds  when  the  cpu-tlmes  are  not  normal¬ 
ized  (a<.0 12,  c.I.  9.0  -  50.2).  Individuals  using  functional  testing  used  28.4  more 
minutes  of  connect  time  than  did  those  using  structural  testing  (a<.004,  c.I.  11.7  - 
45.1).  The  number  of  computer  runs  of  a  program’s  test  driver  was  not  different 
between  the  two  techniques  (a >.05).  These  results  suggest  that  Individuals  using  func¬ 
tional  testing  spent  more  time  on-line  and  used  more  cpu-tlme  per  computer  run  than 
did  those  structurally  testing. 

4.2.5.  Dependence  on  Programmer  Expertise 

The  relation  of  programmer  expertise  to  cost  of  fault  detection  Is  another  question 
In  this  goal  section.  The  expertise  level  of  the  subjects  had  no  relation  to  the  fault 
detection  rate  In  phases  two  and  three  (a>. 05  for  both  F-tests).  Recall  that  phase 
three  of  the  study  used  32  professional  subjects  with  all  three  levels  of  computer  science 
expertise.  In  phase  one,  however,  the  Intermediate  subjects  detected  faults  at  a  faster 
rate  than  did  the  Junior  subjects  (a<.005).  The  total  effort  spent  In  fault  detection  was 
not  different  among  the  expertise  levels  In  any  of  the  phases  (a >.05  for  all  three  F- 
tests).  When  all  74  subjects  are  considered,  years  of  professional  experience  correlates 
positively  with  fault  detection  rate  (R  =  .41,  a <.0002)  and  correlates  slightly  negative¬ 
ly  with  total  detection  time  (R  =  -.25,  a<  03).  These  last  two  observations  suggest 
that  persons  with  more  years  of  professional  experience  detected  the  faults  faster  and 
spent  less  total  time  doing  so.  Several  other  subject  background  measures  showed  no 
relationship  with  fault  detection  rate  or  total  detection  time  (a<.05).  Background 
measures  were  examined  across  all  subjects  and  within  the  groups  of  NASA/CSC  sub- 
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Jects  and  University  of  Maryland  subjects. 

4.2.6.  Dependence  on  Interactions 

There  were  few  significant  Interactions  between  the  main  effects  of  testing  tech¬ 
nique,  program,  and  expertise  level.  There  was  an  Interaction  between  testing  technique 
and  software  type  In  terms  of  fault  detection  rate  and  total  detection  cost  for  the  phase 
three  data  (a<.003  and  a<.007  respectively).  Subjects  using  code  reading  on  the 
abstract  data  type  had  an  Increased  fault  detection  rate  and  a  decreased  total  detection 
time. 

4.2.7.  Relationships  Between  Fault  Detection  Effectiveness  and  Cost 

There  were  several  correlations  between  fault  detection  cost  measures  and  perfor¬ 
mance  measures.  Fault  detection  rate  correlated  overall  with  number  of  faults  detected 
(R  =  .48,  a< .0001),  percentage  of  faults  found  (R  =  .48,  aC.OOOl),  and  total  detec¬ 
tion  time  (P.  =  -.53,  aC.OOOl),  but  not  with  normalized  cpu-tlme,  raw  cpu-tlme,  con¬ 
nect  time,  or  number  of  computer  runs  (a>.05).  Total  detection  time  correlated  with 
normalized  cpu-tlme  (R  =  .36,  a<.04)  and  raw  cpu-tlme  (R  =  .37,  a<.04),  but  not 
with  connect  time,  number  of  runs,  number  of  faults  detected,  or  percentage  of  faults 
detected.  The  number  of  faults  detected  In  the  programs  correlated  with  the  amount  of 
machine  resources  used:  normalized  cpu-tlme  (R  =  .47,  a<.007),  raw  cpu-tlme  (R  = 
.52,  a<.002),  and  connect  time  (R  =  .49,  a<.003),  but  not  with  the  number  of  com¬ 
puter  runs  (a >.05).  The  correlations  for  percentage  of  faults  detected  with  machine 
resources  used  were  similar.  Although  most  of  these  correlations  are  minor,  they  suggest 
that  1)  the  higher  the  fault  detection  rate,  the  more  faults  found  and  the  less  time  spent 
In  fault  detection;  2)  fault  detection  rate  had  no  relationship  with  use  of  machine 
resources;  3)  spending  more  time  In  detecting  faults  had  no  relationship  with  the 
amount  of  faults  detected;  and  4)  the  more  cpu-tlme  and  connect  time  used,  the  more 
faults  found. 

4.2.8.  Summary  of  Fault  Detection  Cost 

Summarizing  the  major  results  of  the  comparison  of  fault  detection  cost:  l)  In  the 
phase  three  data,  code  reading  had  a  higher  fault  detection  rate  than  the  other  methods. 
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with  no  difference  between  functional  testing  and  structural  testing;  2)  In  the  phase  one 
and  two  data,  the  three  techniques  were  not  different  In  fault  detection  rate;  3)  in  the 
phase  two  and  three  data,  total  detection  effort  was  not  different  among  the  techniques, 
but  In  phase  one  less  effort  was  spent  for  structural  testing  than  for  the  other  tech¬ 
niques,  while  reading  and  functional  were  not  different;  4)  fault  detection  rate  and  total 
effort  in  detection  depended  on  the  type  of  software:  the  abstract  data  type  had  the 
highest  detection  rate  and  lowest  total  detection  effort,  the  plotter  and  the  database 
malntalner  had  the  lowest  detection  rate  and  the  highest  total  detection  effort,  and  the 
text  formatter  was  somewhere  In  between  depending  on  the  phase;  5)  functional  testing 
used  more  cpu-tlme  and  connect  time  than  did  structural  testing,  but  they  were  not 
different  In  the  number  of  runs;  6)  In  phases  two  and  three,  subjects  across  expertise  lev¬ 
els  were  not  different  In  fault  detection  rate  or  total  detection  time,  In  phase  one  Inter¬ 
mediate  subjects  had  a  higher  detection  rate;  and  7)  there  was  a  moderate  correlation 
between  fault  detection  rate  and  years  of  professional  experience  across  all  subjects. 

4.3.  Characterization  of  Faults  Detected 

The  third  goal  area  focuses  on  determining  what  classes  of  faults  are  detected  by 
the  different  techniques.  In  the  earlier  section  on  the  faults  In  the  software,  the  faults 
were  characterized  by  two  different  classification  schemes:  omission  or  commission,  and 
Initialization,  control,  data,  computation.  Interface,  or  cosmetic.  The  faults  detected 
across  all  three  study  phases  are  broken  down  by  the  two  fault  classification  schemes  In 
Figure  15.  The  entries  In  the  figure  are  the  average  percentage  (with  standard  devia¬ 
tions)  of  faults  In  a  given  class  observed  when  a  particular  technique  was  being  used. 
Note  that  when  a  subject  tested  a  program  that  had  no  faults  In  a  given  class,  he /she 
was  excluded  from  the  calculation  of  this  average. 

4.3.1.  Omission  vs.  Commission  Classification 

When  the  faults  are  partitioned  according  to  the  omlsslon/commlsslon  scheme, 
there  is  a  distinction  among  the  techniques.  Both  code  readers  and  functional  testers 
observed  more  omission  faults  than  did  structural  testers  (aC.OOl),  with  code  readers 
and  functional  testers  not  being  different  (a >.05).  Since  a  fault  of  omission  occurs  as  a 
result  of  some  segment  of  code  being  left  out,  you  would  not  expect  structurally  generat- 
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ed  test  data  to  find  such  faults.  In  fact,  44%  of  the  subjects  applying  structural  testing 
found  zero  faults  of  omission  when  testing  a  program.  A  distribution  of  the  faults  ob¬ 
served  according  to  this  classification  scheme  appears  In  Figure  16. 

4.3.2.  Six-Part  Fault  Classification 

When  the  faults  are  divided  according  to  the  second  fault  classification  scheme, 
several  differences  are  apparent.  Both  code  reading  and  functional  testing  found  more 
Initialization  faults  than  did  structural  testing  (a<  .05),  with  code  reading  and  function¬ 
al  testing  not  being  different  (a >.05).  Code  reading  detected  more  Interface  faults  than 
did  either  of  the  other  methods  (ot<. 01),  with  no  difference  between  functional  and 
structural  testing  (a>.05),  This  suggests  that  the  code  reading  process  of  abstracting 
and  composing  program  functions  across  modules  must  be  an  effective  technique  for 
finding  Interface  faults.  Functional  testing  detected  more  control  faults  than  did  either 
of  the  other  methods  (a<.01),  with  code  reading  and  structural  testing  not  being 
different  (a>.05).  Recall  that  the  structural  test  data  generation  criteria  examined  Is 
based  on  determining  the  execution  paths  In  a  program  and  deriving  test  data  that  exe¬ 
cute  100%  of  the  program’s  statements.  One  would  expect  that  more  control  path 
faults  would  be  found  by  such  a  technique.  However,  structural  testing  did  not  do  as 
well  as  functional  testing  In  this  fault  class.  The  technique  of  code  reading  found  more 
computation  faults  than  did  structural  testing  (a<.05),  with  functional  testing  not  be¬ 
ing  different  from  either  of  the  other  two  methods  (a  >  .05).  The  three  techniques  were 
not  statistically  different  In  the  percentage  of  faults  they  detected  In  either  the  data  or 
cosmetic  fault  classes  (a>.05  for  both).  A  distribution  of  the  faults  observed  according 
to  this  classification  scheme  appears  In  Figure  17. 

4.3.3.  Observable  Fault  Classification 

Figure  18  displays  the  average  percentage  (with  standard  deviations)  of  faults  from 
each  class  that  were  observable  from  the  test  data  submitted,  yet  were  not  reported  by 
the  tester.5  The  two  on-line  techniques  of  functional  and  structural  testing  were  not 

3  The  standard  deviations  presented  In  the  figure  are  high  because  of  the  several  In¬ 
stances  In  which  all  observable  faults  were  reported. 


different  In  any  of  the  faults  classes  (a >.05).  Note  that  there  was  only  one  fault  In  the 
cosmetic  class. 

4.3.4.  Summary  of  Characterization  of  Faults  Detected 

Summarizing  the  major  results  of  the  comparison  of  classes  of  faults  detected:  l) 
code  reading  and  functional  testing  both  detected  more  omission  faults  and  Initialization 
faults  than  did  structural  testing;  2)  code  reading  detected  more  interface  faults  than 
did  the  other  methods;  3)  functional  testing  detected  more  control  faults  than  did  the 
other  methods;  4)  code  reading  detected  more  computation  faults  than  did  structural 
testing;  and  5)  the  on-line  techniques  of  functional  and  structural  testing  were  not 
different  In  any  classes  of  faults  observable  but  not  reported. 

5.  Conclusions 

This  study  compares  the  strategies  of  code  reading  by  stepwise  abstraction,  func¬ 
tional  testing  using  equivalence  class  partitioning  and  boundary  value  analysis,  and 
structural  testing  using  100%  statement  coverage.  The  study  evaluates  the  techniques 
across  three  data  sets  In  three  different  aspects  of  software  testing:  fault  detection 
effectiveness,  fault  detection  cost,  and  classes  of  faults  detected.  Each  of  the  three  test¬ 
ing  techniques  showed  merit  In  this  evaluation.  The  Investigation  Is  Intended  to  com¬ 
pare  the  different  testing  strategies  In  representative  testing  situations,  using  program¬ 
mers  with  a  wide  range  of  experience,  different  software  types,  and  common  software 
faults. 

The  major  results  of  this  study  are  l)  with  the  professional  programmers,  code 
reading  detected  more  software  faults  and  had  a  higher  fault  detection  rate  than  did 
functional  or  structural  testing,  while  functional  testing  detected  more  faults  than  did 
structural  testing,  but  functional  and  structural  testing  were  not  different  In  fault  detec¬ 
tion  rate;  2)  In  one  UoM  subject  group,  code  reading  and  functional  testing  were  not 
different  In  faults  found,  but  were  both  superior  to  structural  testing,  while  In  the  other 
UoM  subject  group  there  was  no  difference  among  the  techniques;  3)  with  the  UoM  sub¬ 
jects,  the  three  techniques  were  not  different  in  fault  detection  rate;  4)  number  of  faults 
observed,  fault  detection  rate,  and  total  effort  In  detection  depended  on  the  type  of  soft¬ 
ware  tested;  5)  code  reading  detected  more  Interface  faults  than  did  the  other  methods; 
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8)  functional  testing  detected  more  control  faults  than  did  the  other  methods;  and  7) 
when  asked  to  estimate  the  percentage  of  faults  detected,  code  readers  gave  the  most 
accurate  estimates  while  functional  testers  gave  the  least  accurate  estimates. 

The  results  suggest  that  code  reading  by  stepwise  abstraction  (a  nonexecution- 
based  method)  Is  at  least  as  effective  as  on-line  functional  and  structural  testing  In 
terms  of  number  and  cost  of  faults  observed.  They  also  suggest  the  Inadequacy  of  using 
100%  statement  coverage  criteria  for  structural  testing.  Note  that  the  professional  pro¬ 
grammers  examined  preferred  the  use  of  functional  testing  because  they  felt  it  was  the 
most  effective  technique;  their  Intuition,  however,  turned  out  to  be  Incorrect. 

In  comparing  the  results  to  related  studies,  there  are  mixed  conclusions.  A  proto¬ 
type  analysis  done  at  the  University  of  Maryland  in  the  Fall  of  1981  [Hwang  81]  sup¬ 
ported  the  belief  that  code  reading  by  stepwise  abstraction  does  as  well  as  the 
computer-based  methods,  with  each  strategy  having  Its  own  advantages.  In  the  Myers 
experiment  [Myers  78),  the  three  techniques  compared  (functional  testing,  3-person  code 
reviews,  control  group)  were  equally  effective.  He  also  calculated  that  code  reviews  were 
less  cost-effective  than  the  computer-based  testing  approaches.  The  first  observation  Is 
supported  In  one  study  phase  here,  but  the  other  observation  Is  not.  A  study  conducted 
by  Hetzel  [Hetzel  78]  compared  functional  testing,  code  reading,  and  “selective”  testing 
(a  composite  of  functional,  structural,  and  reading  techniques).  He  observed  that  func¬ 
tional  and  “selective”  testing  were  equally  effective,  with  code  reading  being  Inferior. 
As  noted  earlier,  this  Is  not  supported  by  this  analysis.  The  study  described  In  this 
analysis  examined  the  technique  of  code  reading  by  stepwise  abstraction,  while  both  the 
Myers  and  Hetzel  studies  examined  alternate  approaches  to  off-line  (nonexecutlon-based) 
review/reading. 

A  few  remarks  are  appropriate  about  the  comparison  of  the  cost-effectiveness  and 
phase-avallablllty  of  these  testing  techniques.  When  examining  the  effort  associated 
with  a  technique,  both  fault  detection  and  fault  Isolation  costs  should  be  compared. 
The  code  readers  have  both  detected  and  Isolated  a  fault;  they  located  It  In  the  source 
code.  Thus,  the  reading  process  condenses  fault  detection  and  Isolation  Into  one  activi¬ 
ty.  Functional  and  structural  testers  have  only  detected  a  fault;  they  need  to  delve  Into 
the  source  code  and  expend  additional  effort  In  order  to  Isolate  the  defect.  Also,  a 
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nonexecution- based  reading  process  can  be  applied  to  any  document  produced  during 
tbe  development  process  (e.g.,  high-level  design  document,  low-level  design  document, 
source  code  document).  While  functional  and  structural  execution-based  techniques 
may  only  be  applied  to  documents  that  are  executable  (e.g.,  source  code),  which  are  usu¬ 
ally  available  l-'ter  In  the  development  process. 

Investigations  related  to  this  work  Include  studies  of  fault  classification  (Weiss  & 
Baslll  85,  Johnson,  Draper  &  Soloway  83,  Ostrand  &  Weyuker  83,  Baslll  &  Perrlcone  84] 
and  Cleanroom  software  development  (Selby,  Baslll  &  Baker  85].  In  the  Cleanroom  soft¬ 
ware  development  approach,  techniques  such  as  code  reading  are  used  In  the  develop¬ 
ment  of  software  completely  off-line  (l.e.,  without  program  execution).  In  the  above 
study,  systems  developed  using  Cleanroom  met  system  requirements  more  completely 
and  had  a  higher  percentage  of  successful  operational  test  cases  than  did  systems 
developed  with  a  more  traditional  approach. 

The  empirical  study  presented  Is  intended  to  advance  the  understanding  of  how 
various  software  testing  strategies  contribute  to  the  software  development  process  and 
to  one  another.  The  results  given  were  calculated  from  a  set  of  Individuals  applying  the 
three  techniques  to  unit-sized  programs  -  the  direct  extrapolation  of  the  findings  to  oth¬ 
er  testing  environments  Is  not  Implied.  However,  valuable  Insights  Into  software  testing 
have  been  gained.  Further  work  applying  these  and  other  results  to  devise  effective 
testing  environments  Is  underway. 
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7.  Appendices 

7.1.  Appendix  A.  The  Specifications  for  the  Programs 

Program  1 

Given  an  Input  text  of  up  to  80  characters  consisting  of  words  separated  by  blanks 
or  new-line  characters,  the  program  formats  It  Into  a  llne-by-llne  form  such  that  l)  each 
output  line  has  a  maximum  of  30  characters,  2)  a  word  In  the  Input  text  Is  placed  on  a 
single  output  line,  and  3)  each  output  line  Is  filled  with  as  many  words  as  possible. 

The  Input  text  Is  a  stream  of  characters,  where  the  characters  are  categorized  as  ei¬ 
ther  break  or  nonbreak  characters.  A  break  character  Is  a  blank,  a  new-line  character 
(&),  or  an  end-of-text  character  (/).  New-line  characters  have  no  special  significance; 
they  are  treated  as  blanks  by  the  program.  The  characters  &  and  /  should  not  appear 
In  the  output. 

A  word  Is  defined  as  a  nonempty  sequence  of  nonbreak  characters.  A  break  is  a  se¬ 
quence  of  one  or  more  break  characters  and  Is  reduced  to  a  single  blank  character  or 
start  of  a  new  line  In  the  output. 

When  the  program  Is  Invoked,  the  user  types  the  Input  line,  followed  by  a  /  (end- 
of-text)  and  a  carriage  return.  The  program  then  echos  the  text  Input  and  formats  It  on 
the  terminal. 

If  the  Input  text  contains  a  word  that  Is  too  long  to  fit  on  a  single  output  line,  an 
error  message  Is  typed  and  the  program  terminates.  If  the  end-of-text  character  Is  miss¬ 
ing,  an  error  message  Is  Issued  and  the  program  awaits  the  Input  of  properly  terminated 
line  of  text. 


Program  2 

Given  ordered  pairs  (x,y)  of  either  positive  or  negative  Integers  as  Input,  the  pro¬ 
gram  plots  them  on  a  grid  with  a  horizontal  x-axls  and  a  vertical  y-axls  which  are  ap¬ 
propriately  labeled.  A  plotted  point  on  the  grid  should  appear  as  an  asterisk  (*). 

The  vertical  and  horizontal  scaling  Is  handled  as  follows.  If  the  maximum  absolute 
value  of  any  y-value  Is  less  than  or  equal  to  twenty  (20),  the  scale  for  vertical  spacing 
will  be  one  line  per  Integral  unit  (e.g.,  the  point  (3,8)  should  be  plotted  on  the  sixth  line; 
two  lines  above  the  point  (3,4)).  Note  that  the  origin  (point  (0,0))  would  correspond  to 
an  asterisk  at  the  the  Intersection  of  the  axes  (the  x-axls  Is  referred  to  as  the  Oth  line). 
If  the  maximum  absolute  value  of  any  x-value  Is  less  than  or  equal  to  thirty  (30),  the 
scale  for  horizontal  spacing  will  be  one  space  per  Integral  unit  (e.g.,  the  point  (4,5) 
should  be  plotted  four  spaces  to  the  right  of  the  y-axls;  two  spaces  to  the  right  of  (2,5)). 
However,  If  the  maximum  absolute  value  of  any  y-value  Is  greater  than  twenty  (20),  the 
scale  for  vertical  spacing  will  be  one  line  per  every  (max  abs  of  yval)/20  rounded-up. 
(e.g..  If  the  maximum  absolute  value  of  any  y-vaiue  to  be  plotted  Is  66,  the  vertical  ilne 
spacing  will  be  a  line  for  every  four  (4)  Integral  units.  In  such  a  data  set,  points  with 
y-values  greater  than  or  equal  to  eight  and  less  than  twelve  will  show  up  as  asterisks  In 
the  second  line,  points  with  y-values  greater  than  or  equal  to  twelve  and  less  than  slx- 
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teen  will  show  up  as  asterisks  In  the  third  line,  etc.  Continuing  the  example,  the  point 
(3,15)  should  be  plotted  on  the  third  line;  two  lines  above  the  point  (3,5).)  Horizontal 
scaling  Is  handled  analogously. 

If  two  or  more  of  the  points  to  be  plotted  would  show  up  as  the  same  asterisk  In 
the  grid  (like  the  points  (9,13)  and  (9,15)  In  the  above  example),  a  number  ’2’  (or  what¬ 
ever  number  Is  appropriate)  should  be  printed  Instead  of  the  asterisk.  Points  whose  as¬ 
terisks  will  lie  on  a  axis  or  grid  marker  should  show  up  In  place  of  the  marker. 

Program  S 

A  list  Is  defined  to  be  an  ordered  collection  of  Integer  elements  which  may  have  ele¬ 
ments  annexed  and  deleted  at  either  end,  but  not  In  the  middle.  The  operations  that 
need  to  be  available  are  ADDFIRST,  ADDLAST,  DELETEFIRST,  DELE  TEL  AST, 
FIRST,  ISEMPTY,  LISTLENGTH,  REVERSE,  and  NEWLIST.  Each  operation  Is 
described  In  detail  below.  The  lists  are  to  contain  up  to  a  maximum  of  five  (5)  ele¬ 
ments.  If  an  element  Is  added  to  the  front  of  a  *‘fuir'  list  (one  containing  five  elements 
already),  the  element  at  the  back  of  the  list  is  to  be  discarded.  Elements  to  be  added  to 
the  back  of  a  full  list  are  discarded.  Requests  to  delete  elements  from  empty  lists  result 
In  an  empty  list,  and  requests  for  the  first  element  of  an  empty  list  results  In  zero  (0)  be¬ 
ing  returned.  The  detailed  operation  descriptions  are  as  below: 

ADDFIRST(LIST  L,  INTEGER  I) 

Returns  the  list  L  with  I  as  Its  first  element  followed  by  all  the  elements  of  L.  If  L 
Is  “full”  to  begin  with,  L's  last  element  Is  lost. 

ADDLAST(LIST  L,  INTEGER  I) 

Returns  the  list  with  all  of  the  elements  of  L  followed  by  I.  If  L  Is  full  to  begin 
with,  L  Is  returned  (l.e.,  I  Is  Ignored). 

DELETEFIRST(LIST  L) 

Returns  the  list  containing  all  but  the  first  element  of  L.  If  L  Is  empty,  then  an 
empty  list  Is  returned. 

DELETELAST(LIST  L) 

Returns  the  list  containing  all  but  the  last  element  of  L.  If  L  Is  empty,  then  an 
empty  list  is  returned. 

FIRST(LIST  L) 

Returns  the  first  element  In  L.  If  L  Is  empty,  then  It  returns  zero  (0). 
ISEMPTY(LIST  L) 

Returns  one  (1)  If  L  Is  empty,  zero  (0)  otherwise. 

LISTLENGTH(LIST  L) 

Returns  the  number  of  elements  In  L.  An  empty  list  has  zero  (0)  elements. 

NE\VLIST(LIST  L) 

Returns  an  empty  list. 

REVERSE(LIST  L) 

Returns  a  list  containing  the  elements  of  L  In  reverse  order. 


Program  4 

(Note  that  a  'file'  Is  the  same  thing  as  an  IBM  'dataset'.) 

The  program  maintains  a  database  of  bibliographic  references.  It  first  reads  a  mas¬ 
ter  file  of  current  references,  then  reads  a  file  of  reference  updates,  merges  the  two,  and 
produces  an  updated  master  file  and  a  cross  reference  table  of  keywords. 

The  first  input  file,  the  master,  contains  records  of  74  characters  with  the  following 
format: 

column  comment 


1-3  each  reference  has  a  unique  reference  key 
4-14  author  of  publication 
15  -  72  title  of  publication 
73  -  74  year  Issued 

The  key  should  be  a  three  (3)  character  unique  Identifier  consisting  of  letters  between 
A-Z.  The  next  Input  file,  the  update  file,  contains  records  of  75  characters  in  length. 
The  only  difference  from  a  master  file  record  Is  that  an  update  record  has  either  an  'A' 
(capital  A  meaning  add)  or  a  'R'  (capital  R  meaning  replace)  In  column  75.  Both  the 
master  and  update  files  are  expected  to  be  already  sorted  alphabetically  by  reference  key 
when  read  Into  the  program.  Update  records  with  action  replace  are  substituted  for  the 
matching  key  record  in  the  master  file.  Records  with  action  add  are  added  to  the  mas¬ 
ter  file  at  the  appropriate  location  so  that  the  file  remains  sorted  on  the  key  field.  For 
example,  a  valid  update  record  to  be  read  would  be  (Including  a  numbered  line  Just  for 
reference) 

123450789012345678901234587890123456786012345678901234567890123456789012345 
BITbaker  an  introduction  to  program  testing  83A 

The  program  should  produce  two  pieces  of  output.  It  should  first  print  the  sorted 
list  of  records  In  the  updated  master  file  In  the  same  format  as  the  original  master  file. 
It  should  then  print  a  keyword  cross  reference  list.  All  words  greater  than  three  charac¬ 
ters  In  a  publication's  title  are  keywords.  These  keywords  are  listed  alphabetically  fol¬ 
lowed  by  the  key  fields  from  the  applicable  updated  master  file  entries.  For  example.  If 
the  updated  master  file  contained  two  records, 

AECkermlt  Introduction  to  software  testing  82 

ECKJones  the  realities  of  software  management  81 

then  the  keywords  are  Introduction,  testing,  realities,  software,  and  management.  The 
cross  reference  list  should  look  like 

Introduction 

.ABC 

management 

DDX 

realities 
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DDX 

software 

ABC 

DDX 

testing 

ABC 

Some  possible  error  conditions  that  could  arise  and  the  subsequent  actions  Include 
the  following.  The  master  and  update  flies  should  be  checked  for  sequence,  and  If  a 
record  out  of  sequence  Is  found,  a  message  similar  to  'key  ABC  out  of  sequence’  should 
appear  and  the  record  should  be  discarded.  If  an  update  record  Indicates  replace  and 
the  matching  key  can  not  be  found,  a  message  similar  to  ’update  key  ABC  not  found’ 
should  appear  and  the  update  record  should  be  Ignored.  If  an  update  record  Indicates 
add  and  a  matching  key  Is  found,  something  like  ’key  ABC  already  In  file’  should  ap¬ 
pear  and  the  record  should  be  Ignored.  (End  of  specification.) 

7.2.  Appendix  B.  The  Source  Code  for  the  Programs 

Program  1 

001:  C  NOTE  THAT  YOU  DO  NOT  NEED  TO  VERIFY  THE  FUNCTION  ’MATCH’. 

002:  C  IT  IS  DESCRIBED  THE  FIRST  TIME  IT  IS  USED,  AND  ITS  SOURCE  CODE 
003:  C  IS  INCLUDED  AT  THE  END  FOR  COMPLETENESS. 

004:  C 

005:  C  NOTE  THAT  FORMAT  STATEMENTS  FOR  WRITE  STATEMENTS  INCLUDE 
A  LEADING 

008:  C  .AND  REQUIRED  ’  ’  FOR  CARRIAGE  CONTROL 
007: 

008:  C  VARIABLE  USED  IN  FIRST,  BUT  NEEDS  TO  BE  INITIALIZED 
000:  INTEGER  MO  REIN 

010: 

011:  C  STORAGE  USED  BY  GCHAR 

012:  INTEGER  BCOUNT 

013:  CHARACTER*  1  GBUFER(80) 

014:  CHARACTER*80  GBXJF 

015:  C  GBUFER  AND  GBUFSTR  ARE  EQUIVALENCED 
018: 

017:  C  STORAGE  USED  BY  PCHAR 

018:  INTEGER  I 

019:  CHARACTER*  1  OUTLIN(31) 

020:  C  OUTLIN  .AND  OUTLINST  .ARE  EQUIVALENCED 
021: 

022:  CHARACTER  *  1  GCHAR 

023: 

024:  C  CONSTANT  USED  THROUGHOUT  THE  PROGRAM 
025:  CHARACTER  *  1  EOTEXT,  BLANK,  LINEFD 

026:  INTEGER  MAXPOS 

027: 

028:  COMMON  /ALL/  MOREIN,  BCOUNT,  I,  MAXPOS,  OUTLIN, 

029:  X  EOTEXT,  BLANK,  LINEFD,  GBUFER.  GBUF 

030: 

031:  DATA  EOTEXT,  BLANK.  LINEFD.  MAXPOS  /  31  / 
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CALL  FIRST 
ENI> 


032: 

033: 

034: 

035: 

036: 

037: 

038: 

030: 

040: 

041: 

042: 

043: 

044: 

045: 

046: 

047: 

048: 

049: 

050: 

051: 

052: 

053: 

054: 

055: 

056: 

057: 

058: 

050: 

060: 

061: 

062: 

063: 

064: 

065: 

066: 

067: 

068: 

069: 

070: 

071: 

072: 

073: 

074: 

075: 

076: 

077: 

078: 

079: 

080: 

081: 

082: 

083: 

084: 

085: 

086: 


SUBROUTINE  FIRST 
INTEGER  K,  FILL,  BUFPOS 
CHARACTERS  CW 
CHARACTER*  1  BUFFER(3l) 

INTEGER  MOREEN,  BCOUNT,  I,  MAXPOS 

CHARACTER*  1  OUTLIN(31),  GCHAR,  EOTEXT,  BLANK,  LINEFD, 
X  GBUFER(80) 

CHARACTER *80  GBUF 

COMMON  /ALL/  MOREIN,  BCOUNT,  I,  MAXPOS,  OUTLIN, 

X  EOTEXT,  BLANK,  LINEFD,  GBUFER,  GBUF 

BUFPOS  =*  0 
FILL  =»  0 
CW  =  '  ' 

MOREIN  =  1 

I  =  1 
K  —  1 

DOWHILE  (K  .LE.  MAXPOS) 

OUTLIN(K)  =  ’  ’ 

K  =  K  +  l 
ENDDO 

BCOUNT  =*  1 
K  =*  1 

DOWHILE  (K  .LE.  80) 

GBUFER(K)  —  ’V 
K  —  K  +  1 
ENDDO 

DOWHILE  (MOREIN) 

CW  —  GCHAR() 

IF  ((CW  ,EQ.  BLANK)  .OR.  (CW  .EQ.  LINEFD)  .OR. 

X  (CW  .EQ.  EOTEXT))  THEN 

IF  (CW  .EQ.  EOTEXT)  THEN 
MOREIN  =  0 
END  IF 

IF  ((FILL+ 1 +BUFPOS)  .LE.  MAXPOS)  THEN 
CALL  PCHAR(BLANK) 

FILL  =  FILL  +■  1 
ELSE 

CALL  PCHAR(LINEFD) 

FILL  =0 
END  IF 
K  =  1 

DOWHILE  (K  LE.  BUFPOS) 


087: 
088: 
088: 
080: 
081: 
082: 
083: 
084: 
085:  10 
086: 
087: 
088: 
088: 
100: 
101. 
102: 
103: 
104: 
105: 
106: 
107: 
108: 
108: 
110: 
111: 
112: 
113: 
114: 
115: 
116: 
117: 
118: 
118: 


CALL  PCHAR(BUFFER(K)) 

K  =  K  +  1 
ENDDO 

FILL  =  FILL  ■+•  BUFPOS 
BUFPOS  =  0 
ELSE 

IF  (BUFPOS  .EQ.  MAXPOS)  THEN 
WRITE(6,10) 

FORMATS  ’,’***WORD  TO  LONG***’) 
MOREIN  =  1 
ELSE 

BUFPOS  =  BUFPOS  4-  1 
BUFFER(BUFPOS)  =  CW 
END  IF 
END  IF 
ENDDO 

CALL  PCHAR(L INEFD ) 

END 


CHARACTER*  1  FUNCTION  GCHAR() 

INTEGER  MATCH 
CHARACTER  *  80  GBUFSTR 

INTEGER  MOREIN,  BCOUNT,  I,  MAXPOS 
CHARACTER*  1  OUTLIN(31),  EOTEXT,  BLANK,  L INEFD. 

X  GBUFER(80) 

CHARACTER  *  80  GBUF 

COMMON  /ALL/  MOREIN,  BCOUNT.  I,  MAXPOS.  OUTLIN, 
X  EOTEXT,  BLANK.  L INEFD,  GBUFER,  GBUF 


EQUIVALENCE  (GBUFSTR, GBUFER) 


120:  IF  (GBUFER(l)  .EQ.  ’Z’)  THEN 

121:  READ(5,20)  GBUF 

122:  20  FORMAT(A80) 

123:  C 

124:  C  MATCH(CARRAY.C)  RETURNS  1  IF  CHARACTER  C  IS  IN 
CHARACTER  ARRAY 

125:  C  CARRAY,  RETURNS  0  OTHERWISE.  ARSIZE  IS  THE  SIZE  OF  CARRAY. 
128:  C 

127:  IF  (MATCH(GBUF.EOTEXT)  .EQ.  0)  THEN 

128:  WRITE(8.30) 

129:  30  FORMAT(’  Y***NO  END  OF  TEXT  MARK***') 

130:  GBUFER(2)  =  EOTEXT 

131.  ELSE 

132:  C  GBUFER(l)  =  GBUF 

133:  GBUFSTR  =  GBUF 

134:  END  IF 

135:  END  IF 

136:  GCHAR  =  GBUFER(BCOUNT) 

137:  BCOUNT  =  BCOUNT  4-  1 


141:  SUBROUTINE  PCHAR  (C) 

142:  CHARACTER*  1  C 

143:  CHARACTER*31  SOUT,  OUTLINST 

144:  INTEGER  K 

145: 

146:  INTEGER  MOREIN,  BCOUNT,  I,  MAXPOS 

147:  CHARACTER*!  OUTLIN(31),  GCHAR,  EOTEXT,  BLANK,  LINEFD, 

148:  X  GBUFER(80) 

149:  CHARACTER *80  GBUF 

150:  COMMON  /ALL/  MOREIN,  BCOUNT,  I,  MAXPOS,  OUTLIN, 

151:  X  EOTEXT,  BLANK,  LINEFD,  GBUFER,  GBUF 

152: 

153:  EQUIVALENCE  (OUTLINST, OUTLIN) 

154: 

155:  IF  (C  .EQ.  LINEFD)  THEN 

156:  SOUT  =  OUTLINST 

157:  WRITE(6,40)  SOUT 

158:  40  FORMAT('  'A-31) 

159:  K  —  1 

160:  DOWHILE  (K  .LE.  MAXPOS) 

161:  OUTLIN(K)  =  '  1 

162:  K  =  K  +  1 

163:  ENDDO 

164:  I  =  1 

165:  ELSE 

166:  OUTLIN(I)  =  C 

167:  1  =  1+1 

168:  END  IF 

169:  END 


Program  2 


1:  INT  WIDTH  =  30, 

2:  HEIGHT  =  20, 

3:  GRID  WO  =  61, 

4:  LARGENUM  =  100000000 

5:  STRING  TICKS[6l]  = 

6:  -I-  -|.  -|.  -|.  -|-  -|.  -|-  -|-  ‘I*  -I*  -I-  ‘I’ 

7 : 

8: 

9:  PROC  SORT  (INT  ARRAY  KEYBUF.  INT  ARRAY  FREEBUF,  INT  N) 
10: 

11:  INT  I.  MAXP 

12:  INT  ARRAY  SRTKEYB(  100),  SRTFREEB(IOO) 

13: 

14:  I  :=  0 

15:  WHILE  I  <  N  DO 

16:  SRTKEYB(I)  :=  KEYBUF(I) 

17:  SRTFREEB(I)  :=  FREEBUF(I) 

18:  I  =  I  4-  l 

19:  END 

20: 

21:  I  :=  N 

22:  WHILE  I  >  0  DO 

23:  MAXP  ;=  MAXELE(SRTKEYB.I) 
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24:  KEYBUF(N-I)  SRTKEYB(MAXP) 

25:  FREEBUF(N-I)  :—  SRTFREEB(MAXP) 

28:  CALL  REMOVE(SRTKEYB,MAXP,I) 

27:  CALL  REMOVE(SRTFREEB,MAXP,I) 

28:  I  :=  I  -  1 

29:  END 

30: 

31: 

32: 

33:  INT  FUNC  MAXELE  (INT  ARRAY  BUF,  INT  N) 

34: 

35:  INT  I,  MAXPTR,  MAX 

38: 

37:  MAXPTR  :=  -1 

38:  MAX  :=  -LARGENUM 

39:  I  :—  0 

40:  WHILE  I  <  N  DO 

41:  IF  BUF(I)  >  MAX 

42:  THEN 

43:  MAX  :=  BUF(I) 

44:  MAXPTR  :=*  I 

45:  END 

46:  I  1+  1 

47:  END 

48:  RETURN (MAXPTR) 

49: 

50: 

51: 

52:  INT  FUNC  MINELE  (INT  ARRAY  BUF,  INT  N) 

53: 

54:  INT  I,  MINPTR.  MIN 

55: 

56:  MINPTR  :—  -1 

57:  MIN  :=  LARGENUM 

58:  I  0 

59:  WHILE  I  <  N  DO 

60:  IF  BUF(I)  <  MIN 

61:  THEN 

62:  MIN  :=  BUF(I) 

63:  MINPTR  :—  I 

84:  END 

65:  I  :=>  I  +  1 

66:  END 

67:  RETURN(MINPTR) 

68: 

69: 

70: 

71:  PROC  REMOVE  (INT  ARRAY  BUF,  INT  PTR,  INT  N) 
72: 

73:  INT  I 

74: 

75:  I  :=  PTR 

76:  WHILE  I  <  N-l  DO 

77:  BUF(I)  :==  BUF(M-l) 

78:  I  I  +  1 


END 


79 

80 
81 
82: 

83:  INT  FUNC  ABS  (INT  VAL) 

84: 

85:  IF  VAL  <  0 

88:  THEN 

87:  RETURN(-VAL) 

88:  ELSE 

89:  RETURN(VAL) 

90:  END 

91: 

92: 

93: 

94:  INT  FUNC  SLASH  (INT  TOP,  INT  BOT) 

95: 

96:  INT  RES 

97: 

98:  RES  :=■=  TOP /BOT 

99:  IF  TOP  <  >  RES*BOT  AND. 

100:  (TOP  >  0  AND.  BOT  >  0  .OR.  TOP  <  0  AND.  BOT  <  0) 

101:  THEN  RES  :—  RES  +  1 

102:  END 

103:  RETURN  (RES ) 

104: 

105:  INT  FUNC  MOD  (INT  N,  INT  M) 

106: 

107:  INT  VAL 

108: 

109:  VAL  :=  N-N/M*M 

110:  IF  VAL  <  0 

111-  THEN 

112:  VAL  :*»  VAL  +  M 

113:  END 

114:  RETURN  (VAL) 

115: 

116: 

117:  PROC  MAIN 
118: 

119:  CHAR  ARRAY  GRID(ei) 

120:  STRING  STR[6lj 

121:  INT  ARRAY  XVAL(IOO).  YVAL(lOO) 

122:  INT  I.  J.  NUMOBS,  MAXY,  MAXX,  MINX,  HORISP,  VERTSP,  VLINE 

123: 

124:  I  :=*  0 

125:  WHILE  NOT.  EOI  DO 

128:  READ(XVAL(I),YVAL(I)) 

127:  I  :=  I  4-  1 

128:  END 

129:  NUMOBS  :=  I 

130: 

131:  CALL  SORT(YVAL  ,XVAL  ,NUMO  BS ) 

132:  MAXY  YV.\L(0) 

133:  VERTSP  :=  SLASH(MAXY HEIGHT) 


MAXX  XVAL(MAXELE(XVAL,NUMOBS)) 

MINX  :==  XVAL(M3NELE(XVAL,NLJMOBS)) 

IF  ABS(MINX)  >  ABS(MAXX) 

THEN 

HORISP  :=  SLASH(ABS(MINX), WIDTH) 

ELSE 

HORISP  :=  SLASH(ABS(MAXX),  WIDTH) 

END 

STR  :=  ’  X  AXIS* 

WRITE(STR.SKIP) 

I  :*»  0 

VLINE  HEIGHT 
WHILE  VLINE  >  0  DO 

J  :=  0 

IF  MOD(VLINE,5)  —  0 
THEN 

UNPACK(TICKS.GRID) 

ELSE 

WHILE  J  <  GRIDWD  DO 
GRID(J) 

J  :=  J  +  I 
END 
END 

VLINE  :=  VLINE  -  X 

WHILE  VLINE  *  VERTSP  <  YVAL(I)  DO 
IF  XVAL(I)  >—  0 
THEN 

GRID(WTDTH  +  SLASH(XVAL(I),HORISP)) 
ELSE 

GRID(WIDTH  -  SLASH(-XVAL(I)HORISP)) 
END 

I  I  +  1 
END 

GRID( WIDTH)  *|” 

PACK(GRID.STR) 

WRITE(STR,SKIP) 

END 

STR  :=* 


UNPACK(STR.GRID) 

WHILE  0  <*=  YVAL(I)  .AND.  I  IMOBS  DO 

IF  XVAL(I)  >  =  0 
THEN 

GRED(WIDTH  +  SLASH(XVAL(I), HORISP))  := 
ELSE 

GRID(WIDTH  -  SLASH(-XVAL(I)  .HORISP))  := 


180: 

190:  PACK  ( GRID ,  STR) 

191:  WRITE(STR,SKIP) 

192:  STR  ’  Y  AXIS’ 

193:  WRITE(  STR, SKIP) 

194: 

198:  START  MAIN 

Program  S 

001:  C  NOTE  THAT  YOU  DO  NOT  NEED  TO  VERIFY  THE  FUNCTIONS 
DRIVER.  GETARG, 

002:  C  CHAREQ.  CODE.  AND  PRINT.  THEIR  SOURCE  CODE  IS 
DESCRIBED  AND 

003:  C  INCLUDED  AT  THE  END  FOR  COMPLETENESS. 

004:  C  NOTE  THAT  FORMAT  STATEMENTS  FOR  WRITE  STATEMENTS 
INCLUDE  A  LEADING 

005:  C  AND  REQUIRED  ’  ’  FOR  CARRIAGE  CONTROL 
006:  C 

007:  INTEGER  POOL(7),  LSTEND 

008:  INTEGER  LISTSZ 

009:  C 

010:  COMMON  /ALL/  LISTSZ 

Oil:  C 
012:  C 

013:  LISTSZ  5 

014:  CALL  DRIVER  (POOL.  LSTEND) 

015:  STOP 

016:  END 

017:  C 
018:  C 

019:  FUNCTION  ADFRST  (POOL.  LSTEND,  I) 

020:  INTEGER  ADFRST 

021:  INTEGER  POOL(7),  LSTEND,  I 

022:  INTEGER  LISTSZ 

023:  COMMON  /ALL/  LISTSZ 

024:  C 

025:  INTEGER  A 

028.  C 

027:  IF  (LSTEND  .GT.  LISTSZ)  THEN 

028:  LSTEND  =  LISTSZ  -  1 

029:  END  IF 

030:  LSTEND  =  LSTEND  +  1 

031:  A  =»  LSTEND 

032:  DOWHILE  (A  GE.  1) 

033:  POOL(A+l)  =  POOL(A) 

034:  A  —  A  -  1 

035:  ENDDO 

036:  C 

037:  POOL(l)  —  I 

038:  ADFRST  *->  LSTEND 

039:  RETURN 

040:  END 

041:  C 


042:  C 

043:  FUNCTION  ADLAST  (POOL,  LSTEND,  I) 

044:  INTEGER  ADLAST 

045:  INTEGER  POOL(7),  LSTEND,  I 

046:  INTEGER  LISTSZ 

047:  COMMON  /ALL/  LISTSZ 

048:  C 

049:  IF  (LSTEND  .LE.  LISTSZ)  THEN 

050:  LSTEND  “  LSTEND  +  1 

051:  POOL(LSTEND)  —  I 

052:  END  IF 

053:  ADLAST  =  LSTEND 

054:  RETURN 

055:  END 

056:  C 

057:  C 

058:  FUNCTION  DELFST  (POOL,  LSTEND) 

059:  INTEGER  DELFST 

060:  INTEGER  POOL(7),  LSTEND 

061:  INTEGER  LISTSZ 

062:  COMMON  /ALL/  LISTSZ 

063:  C 

064:  INTEGER  A 

065:  IF  (LSTEND  .GT.  1)  THEN 

066:  A  =  1 

067:  LSTEND  =»  LSTEND  -  1 

068:  DOWHILE  (A  .LE.  LSTEND) 

069:  POOL(A)  —  POOL(A+l) 

070:  A*A+1 

071:  ENDDO 

072:  END  IF 

073:  DELFST  —  LSTEND 

074:  RETURN 

075:  END 

076:  C 

077:  C 

078:  FUNCTION  DELLST  (LSTEND) 

079:  INTEGER  DELLST 

080:  INTEGER  LSTEND 

081:  C 

082:  T  (LSTEND  .GE.  1)  THEN 

083:  LSTEND  —  LSTEND  -  1 

084:  ENDIF 

085:  DELLST  =  LSTEND 

086:  RETURN 

087:  END 

088:  C 

089:  C 

090:  FUNCTION  FIRST  (POOL,  LSTEND) 

091:  INTEGER  FIRST 

092:  INTEGER  POOL(7),  LSTEND 

093:  C 

094:  IF  (LSTEND  .LE.  1)  THEN 

095:  FIRST  =  0 


007:  FIRST  —  POOL(l) 

008:  END  IF 

000:  RETURN 

100:  END 

101:  C 
102:  C 

103:  FUNCTION  EMPTY  (LSTEND) 

104:  INTEGER  EMPTY 

105:  INTEGER  LSTEND 

106:  C 

107:  IF  (LSTEND  .LE.  1)  THEN 

108:  EMPTY  =  1 

100:  ELSE 

110:  EMPTY  —  0 

111:  ENDIF 

112:  RETURN 

113:  END 

114:  C 

115:  C 

116:  FUNCTION  LSTLEN  (LSTEND) 

117:  INTEGER  LSTLEN 

118:  INTEGER  LSTEND 

119:  C 

120:  LSTLEN  =  LSTEND  -  1 

121:  RETURN 

122:  END 

123:  C 

124:  C 

125:  FUNCTION  NEWLST  (LSTEND) 

128:  INTEGER  NEWLST 

127:  INTEGER  LSTEND 

128:  C 

129:  NEWLST  =  0 

130:  RETURN 

131:  END 

132:  C 
133:  C 

134:  SUBROUTINE  REVERS  (POOL.  LSTEND) 

135:  INTEGER  POOL(7),  LSTEND 

136:  C 

137:  INTEGER  I.  N 

138:  C 

139:  N  =  LSTEND 

140:  I  =  1 

141:  DO  WHILE  (I  .LE.  N) 

142:  POOL(I)  —  POOL(N) 

143:  N  —  N  -  1 

144:  1  =  1+1 

145:  ENDDO 

146:  RETURN 

147:  END 


Program  4 

001:  C  NOTE  THAT  YOU  DO  NOT  NEED  TO  VERIFY  THE  ROUTINES 
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DRIVER,  STREQ,  WORDEQ, 

002:  C  NXTSTR,  ARRCPY,  CHARPT,  BEFORE.  CHARE Q,  AND  WRDBEF. 

THEIR  SOURCE 

003:  C  CODE  IS  DESCRIBED  AND  INCLUDED  AT  THE  END  FOR 
COMPLETENESS. 

004:  C  NOTE  THAT  FORMAT  STATEMENTS  FOR  WRITE  STATEMENTS 
INCLUDE  A  LEADING 

005:  C  AND  REQUIRED  *  *  FOR  CARRIAGE  CONTROL 
006:  C  THE  SFORT  LANGUAGE  CONSTRUCT  ’.IF  (EXPRESSION)’  BEGINS 
A  BLOCK ED 

007:  C  IF-THEN i-ELSEl  STATEMENT,  AND  IT  IS  EQUIVALENT  TO 
THE  F77 

008:  C  TF  (EXPRESSION)  THEN’. 

000:  C 

010:  CALL  DRIVER 

Oil:  STOP 

012:  END 

013:  C 

014:  C 

015:  SUBROUTINE  MAINSB 

016:  C 

017:  LOGICAL*  1  U$KEY(3),U$AUTH(U),U$TITL(58),U$YEAR(2).U$ACTN(l) 

018:  LOGICAL*l  M$KEY(3).M$AUTH(ll),M$TITL(58),M$YEAR(2) 

019:  LOGICAL-1  ZZZ(3),  LASTUK(3),  LASTMK(3) 

020:  LOGICAL*l  STREQ,  CHARE Q,  BEFORE,  CHARPT 

021:  INTEGER  I 

022:  C 

023:  LOGICAL*  1  WORD(500,12),  REFKEY(1000,3) 

024:  INTEGER  NUMWDS,  NUMREF,  PTR(500),  NEXT(1000) 

025:  COMMON  /WORDS/  WORD.  REFKEY.  NUMWDS,  NUMREF,  PTR,  NEXT 

028:  C 

027:  WRITE(6,290) 

028:  200  FORMATC  ’,’  UPDATED  LIST  OF  MASTER  ENTRIES’) 

029:  DO  300  I  =—  1,  3 

030:  LASTMK(I)  —  CHARPT(’  ’) 

031:  LASTUK(I)  =»  CHARPT(’  ’) 

032:  ZZZ(I)  ==  CHARPT(’Z’) 

033:  300  CONTINUE 
034:  C 

035:  NUMWDS  =*=  0 

036:  NUMREF  =■  0 

037:  CALL  GETNM(M$KEY,M$AUTH,M$TITL,M$YEAR,LASTMK) 

038:  CALL  GETNUP(U$KEY,U$AUTH,U$TITL,U$YEAR,U$ACTN,LASTUK) 

039:  C' 

040:  DO  WHILE  ((.NOT.(STREQ(M$KEY,ZZZ,3)))  .OR. 

041:  X  (.NOT.(STREQ(U$KEY,ZZZ,3)))  ) 

042:  .IF  (STREQ(U$KEY,M$KEY,3)) 

043:  .IF  (,NOT.(CHAREQ(U$ACTN(l),’R’))) 

044:  WRITE(6,100)  U$KEY 

045:  100  FORMATC  ’.’KEY  ’.3A1,’  IS  ALREADY  IN  FILE’) 

048:  END  IF 

047:  CALL  OUTPUT(U$KEY.U$AUTH,U$TITL,U$YEAR) 

048:  CALL  DICTUP(U$KEY,U$TITL,58) 

049:  CALL  GETNM(M$KEY,M$AUTH,M$TITL.M$YEARLASTMK) 

050:  CALL  GETNUP(U$KEY,U$AUTH.U$TITL.U$YEAR.U$ACTN.LASTUK) 


END  IF 


051: 

052:  C 

053:  .IF  (BEFORE(M$KEY,3,U|KEY,3)) 

054:  CALL  OUTPUT(M$KETY ,M$AUTH,M$TITL,M$YEAR) 

055:  CALL  DICTUP(M$KEY,M$TITL,58) 

056:  CALL  GETNM(M$KEY,M|aUTH,M$TITL  .MjYEARLASTMK) 

057:  ENDIF 

058:  C 

050:  .IF  (BEFORE(U$KEY,3,M$KEY,3)) 

060:  .IF  (CHAREQ(U*ACTN(1),’R‘)) 

061:  WRITE(6,H0)  U$KEY 

062:  110  FORMATC  ‘.’UPDATE  KEY  ‘.3Al/  NOT  FOUND’) 

063:  ENDIF 

064:  CALL  OUTPUT(U$KEY,U$AUTH,U$TITL,U$YEAR) 

065:  CALL  DICTUP(U$KEY,U$TITL,58) 

066:  CALL  GETNUP(U$KEY,U$AUTH,U$TITL  ,U$YEAR,U$ACTN,LASTUK) 

067:  ENDIF 

068:  ENDDO 

060:  C 

070:  CALL  SRTWDS 

071:  CALL  PRTWDS 

072:  RETURN 

073:  END 

074:  C 
075:  C 

076:  SUBROUTINE  GETNM(KEY .AUTH.TITL .YEAR .LASTMK) 

077:  LOGICAL*!  KEY(3)A.UTH(ll).TITL(58),YEAR(2),LASTMK(3) 

078:  C 

079:  LOGICAL*  1  SEQ,  INLINE(80) 

080:  LOGICAL*l  BEFORE.  CHARPT.  CHAREQ 

081:  LOGICAL*  1  GO$M.  GO$U 

082:  COMMON  /DRIV/  GO$M.  GO$U 

083:  C 

084:  SEQ  *=  1 

085:  DOWHILE  (SEQ) 

086:  .IF  (GO$M) 

087  C 

088:  C  READ  FROM  THE  MASTER  FILE 
089:  C 

090:  READ( 10,200 .END— 299)  INLINE 

091:  ELSE 

092:  C 

093:  C  SEE  REMARK  ABOUT  THE  CHARACTER  '%’  LATER  IN  THE  ROUTINE. 

094:  C 

095:  INLINE(l)  —  CHARPT(’%') 

096:  ENDIF 

097:  200  FORMAT(80A1) 

098:  DO  210  I  *  1,  3 

099:  KEY(I)  —  INLINE!  I) 

100:  210  CONTINUE 

101:  DO  220  I  =**  1.  11 

102:  AUTH(I)  =  INLINE(3+I) 

103:  220  CONTINUE 

104:  DO  230  I  =  1,  58 

105:  TITL(I)  =  INLINE!  14+ 1) 


106:  230  CONTINUE 

107:  DO  240  I  —  1.  2 

108:  YEAR(I)  *  INLENE(72+I) 

109:  240  CONTINUE 

110:  C 

111:  C  A  METHOD  OF  SPECIFYING  END-OF-FILE  IN  A  FILE  IS  TO  PUT 
THE  CHARACTER 

112:  C  AS  THE  FIRST  CHARACTER  ON  A  LINE.  THE  DRIVER  USES  THIS 
FOR  MULTIPLE 
113:  C  SETS  OF  INPUT  CASES. 

114:  C 

115:  .IF  ((.NOT.(CHAREQ(KEY(l),’%’)))  AND. 

116:  X  (BEFORE(KEY,3XASTMK,3))  ) 

117:  WRITE(6,250)  KEY 

118:  250  FORMATO  ',’KEY  ’.3A1.’  OUT  OF  SEQUENCE') 

119:  ELSE 

120:  CALL  ARRCPY(KEY,LASTMK,3) 

121:  SEQ  «■  0 

122:  END  IF 

123:  .IF  (CHAREQ(KEY(1),'%’)) 

124:  SEQ  =  0 

125:  DO  270  I  =  1,  3 

126:  KEY(I)  *»  CHARPT{’Z’) 

127:  270  CONTINUE 

128:  END  IF 

129:  ENDDO 

130:  RETURN 

131:  299  CONTINUE 

132:  GOjM  —  0 

133:  DO  260  I  *»  1,  3 

134:  KEY(I)  =  CHARPT(’Z') 

135:  260  CONTINUE 
136:  RETURN 

137:  END 

138:  C 
139:  C 

140:  SUBROUTINE  GETNUP(KEYAUTH,TITL,  JfEARACTN,LASTUK) 

141:  LOGICAL*l  KEY(3)A.UTH(U),TITL(58),YEAR(2) ACTN(1),LASTUK(3) 

142:  C 

143:  LOGICAL*l  SEQ.  INLINE(80) 

144:  LOGICAL *1  BEFORE.  CHARPT,  CHAREQ 

145:  LOGICAL*!  GO$M,  GO$U 

146:  COMMON  /DRIV/  GO$M,  GO$U 

147:  C 

148:  SEQ  =  1 

149:  DOWHELE  (SEQ) 

150:  .IF  (GO$U) 

151:  C 

152:  C  READ  FROM  THE  UPDATES  FILE 
153:  C 

154:  READ(11.200,END=299)  INLINE 

155:  ELSE 

156:  C 

157:  C  SEE  REMARK  ABOUT  THE  CHARACTER  ’%'  LATER  IN  THE  ROUTINE. 
158:  C 


150:  INLINE(1)  »  CHARPT(’%’) 

180:  END  IF 

161:  200  FORMAT(8GA 1) 

162:  DO  210  I  =  1,  3 

163:  KEY(I)  =  INLINE(I) 

164:  210  CONTINUE 

165:  DO  220  I  =  1,  11 

166:  AUTH(I)  =»  INLINE(3+I) 

187:  220  CONTINUE 

168:  DO  230  I  =  1,  58 

160:  TITL(I)  —  INLINE(14+I) 

170:  230  CONTINUE 

171:  DO  240  I  ■»  1,  2 

172:  YEAR(I)  =  INLINE(72+I) 

173:  240  CONTTNUB 

174:  ACTN(l)  =  INLINE(75) 

175:  C 

176:  C  A  METHOD  OF  SPECIFYING  END-OF-FILE  IN  A  FILE  IS  TO  PUT 
THE  CHARACTER  '%’ 

177:  C  AS  THE  FIRST  CHARACTER  ON  A  LINE.  THE  DRIVER  USES  THIS 
FOR  MULTIPLE 
178:  C  SETS  OF  INPUT  CASES. 

179:  C 

180:  .IF  ((.NOT.(CHAREQ(KEY(l),'95')))  AND. 

181:  X  (BEFORE(KEY,3J,ASTUK,3))  ) 

182:  WRITE(6.250)  KEY 

183:  250  FORMAT(‘  '.'KEY  *,3A1.'  OUT  OF  SEQUENCE’) 

184:  ELSE 

185:  CALL  ARRCPY(KEY,LASTUK,3) 

186:  SEQ  *=*  0 

187:  END  IF 

188:  .IF  (CHAREQ(KEY(l),’%’)) 

189:  SEQ  =—  0 

190:  DO  270  1  =  1,3 

191:  KEY(I)  =  CHARPT(’Z’) 

192:  270  CONTINUE 

193:  END  IF 

194:  ENDDO 

195:  RETURN 

196:  299  CONTINUE 

197:  GO$U  =  0 

198:  DO  260  I  =  1,  3 

199:  KEY(I)  =  CHARPT(’Z’) 

200:  260  CONTINUE 
201:  RETURN 

202:  END 

203:  C 
204:  C 

205:  SUBROUTINE  OUTPUT( KEY, AUTH.TITL, YEAR) 

206:  LOGICAL*l  KEY(3),  AUTH(ll),  TITL(58),  YEAR(2) 

207:  C 

208:  WRITE(8,200)  KEY,  AUTH,  TITL,  YEAR 

209:  200  FORMAT(’  ’.SAl.llAl.SSAl^l) 

210:  RETURN 


SUBROUTINE  PRTWDS 


213:  C 
214: 

215:  C 

216:  LOGICAL«l  WORD(500,12),  REFKEY(1000,3) 

217:  INTEGER  NUMWDS,  NUMREF,  PTR(500),  NEXT(lOOO) 

218:  COMMON  /WORDS/  WORD,  REFKEY,  NUMWDS,  NUMREF,  PTR,  NEXT 

219:  C 

220:  C  THE  ABOVE  GROUP  OF  DATA  STRUCTURES  SIMULATES  A  LINKED 
LIST. 

221:  C  WORD(I,J)  IS  A  KEYWORD  -  J  RANGING  FROM  1  TO  12 
222:  C  REFKEY(PTR(I),K),K=1,3  IS  THE  FIRST  3  LETTER  KEY  THAT  HAS 
AS  A 

223:  C  KEYWORD  WORD(I,J),J— 1.12 

224:  C  REFKEY(NEXT(PTR(I) )  ,K) ,K«=  1 . 3  IS  THE  SECOND  3  LETTER  KEY 
THAT  HAS 

225:  C  AS  A  KEYWORD  WORD(I,J),J=-=l,12 

228:  C  REFKEY(NEXT(NEXT(PTR(I)))JC)JC=1,3  IS  THE  THIRD  ...  ETC. 

227:  C  NEXT(J)  IS  EQUAL  TO  -1  WHEN  THERE  ARE  NO  MORE  3  LETTER 
KEYS  FOR 

228:  C  THE  PARTICULAR  KEYWORD 

229:  C 

230:  INTEGER  I,  J 

231:  LOGICAL  *1  FLAG 

232:  C 

233:  WRITE(6,200) 

234:  200  FORMATC  V  KEYWORD  REFERENCE  LIST’) 

235:  DO  210  I  —  1.  NUMWDS 

238:  FLAG  =  1 

237:  WRITE(8,220)  (WORD(I,J),J— 1,12) 

238:  220  FORMATC  \12A1) 

239:  LAST  =  PTR(I) 

240:  DOWHILE  (FLAG) 

241:  WRITE(8.230)  (REFKEY(LAST,J),J=*=1,3) 

242:  230  FORMATC  ’,3A1) 

243:  LAST  =  NEXT(LAST) 

244:  .IF  (LAST  EQ.  -1) 

245:  FLAG  —  0 

246:  END  IF 

247:  ENDDO 

248:  210  CONTINUE 
249:  RETURN 

250:  END 

251:  C 
252:  C 

253:  SUBROUTINE  DICTUP(KEY.STR.STRLEN) 

254:  LOGICAL*  1  KEY(3),  STR(120) 

255:  INTEGER  STRLEN 

256:  C 

257:  LOGICAL*l  WDLEFT,  FLAG.  OKLEN,  NEXTWD(120),  WORDEQ 

258:  INTEGER  LPTR,  NXTSTR,  LEN,  LAB,  I,  K 

259:  C 

260:  LOGICAL *1  WORD(500,12).  REF KEYf  1000,3) 

281:  INTEGER  NUMWDS,  NUMREF,  PTR(500),  NEXT(1000) 

262:  COMMON  /WORDS/  WORD,  REFKEY,  NUMWDS.  NUMREF,  PTR,  NEXT 


283:  C 

284:  C  THE  ABOVE  GROUP  OP  DATA  STRUCTURES  SIMULATES  A 
LINKED  LIST. 

285:  C  WORD(I,J)  IS  A  KEYWORD  -  J  RANGING  FROM  1  TO  12 
268:  C  REFKEY(PTR(I),K),K=1,3  IS  THE  FIRST  3  LETTER  KEY  THAT  HAS 
AS  A 

267:  C  KEYWORD  WORD(I,J),J— 1.12 

268:  C  REFKEY(NEXT(PTR(I)),K),K==1.3  IS  THE  SECOND  3  LETTER  KEY 
THAT  HAS 

260:  C  AS  A  KEYWORD  WORD(I,J),J— 1.12 

270:  C  REFKEY(NEXT(NEXT(PTR(I)))JC),K=1.3  IS  THE  THIRD  ...  ETC. 

271:  C  NEXT(J)  IS  EQUAL  TO  -1  WHEN  THERE  ARE  NO  MORE  3  LETTER 
KEYS  FOR 

272:  C  THE  PARTICULAR  KEYWORD 
273:  C 

274:  WDLEFT  =—  1 

275:  LPTR  *  1 


276:  C 
277: 
278: 
279: 
280: 
281: 
282: 
283: 
284:  C 
285: 

f  286: 

287: 


DO  WHILE  (WDLEFT) 

FLAG  =  1 
OKLEN  =»  1 

LEN  =  NXTSTR(STR.STRLENLPTR,NEXTWD,120) 
.IF  (LEN  .EQ.  0) 

WDLEFT  =  0 
END  IF 

.IF  (LEN  -LE.  2) 

OKLEN  —  0 
END  IF 


288:  C 
280: 
290: 


291: 

292: 

293: 

294: 

295: 

296: 

207: 

298: 

299: 

300: 

301: 

302: 

303:  300 


304: 

305: 

306: 


307:  310 
308: 

309: 

310: 


311: 


312: 

313:  320 


.IF  (OKLEN) 

I  =  1 

DO  WHILE  ((I  -LE.  NTJMWDS)  AND.  FLAG  ) 
.IF  (WORDEQ(NEXTWD,I)) 

LAB  **  I 
FLAG  =*  0 
ENDIF 
I  =*  I  +  1 
ENDDO 
.IF  (FLAG) 

NUMWDS  =  NUMWDS  +  1 
NUMREF  =  NUMREF  +  1 
DO  300  K  =  1.  12 

WORD(NUMWDS.K)  =  NEXTWD(K) 
CONTINUE 

PTR(NUMWDS)  =  NUMREF 
DO  310  K  =  1.  3 

REFKEY(NUMREF.K)  =  KEY(K) 
CONTINUE 
NEXT  ( NUMREF )  =  -1 
ELSE 

NUMREF  =  NUMREF  +  1 
DO  320  K  =  1,  3 

REFKEY(NUMREF,K )  =  KEY(K) 
CONTINUE 
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314:  NEXT(NUMREF)  =  PTR(LAB) 

315:  PTR(LAB)  =  NUMREF 

316:  ENDIF 

317:  ENDIF 

318:  END  DO 

319:  C 

320:  RETURN 

321:  END 

322:  C 
323:  C 

324:  SUBROUTINE  SRTWDS 

325:  C 

326:  LOGICAL*  1  WORD(500,12),  REFKEY(1000,3) 

327:  INTEGER  NUMWDS,  NUMREF,  PTR(500),  NEXT(IOOO) 

328:  COMMON  /WORDS/  WORD,  REFKEY,  NUMWDS.  NUMREF,  PTR,  NEXT 

329:  C 

330:  C  THE  ABOVE  GROUP  OF  DATA  STRUCTURES  SIMULATES  A 
LINKED  LIST. 

331:  C  WORD(I,J)  IS  A  KEYWORD  -  J  RANGING  FROM  1  TO  12 
332:  C  REFKEY(PTR(I).K)JC=»1,3  IS  THE  FIRST  3  LETTER  KEY  THAT  HAS 
AS  A 

333:  C  KEYWORD  WORD(I,J),J=l,12 

334:  CREFKEY(NEXT(PTR(I)),K),K=1,3  IS  THE  SECOND  3  LETTER  KEY 
THAT  HAS 

335:  C  AS  A  KEYWORD  WORD(I,J),J=l,12 

336:  C  REFKEY(NEXT(NEXT(PTR(I))).K),K=1,3  IS  THE  THIRD  ...  ETC. 

337:  C  NEXT(J)  IS  EQUAL  TO  -1  WHEN  THERE  ARE  NO  MORE  3  LETTER 
KEYS  FOR 

338:  C  THE  PARTICULAR  KEYWORD 
339:  C 

340:  INTEGER  I.  J.  K.  LAB.  LOWERB,  UPPERB 

341:  LOGICAL  *1  WRDBEF,  NEXTWD(12) 

342:  C 

343:  UPPERB  =  NUMWDS  -  1 

344:  DO  400  1=  1,  UPPERB 

345:  LOWERB  =  1+1 

346:  DO  410  J  =  LOWERB,  NUMWDS 

347:  .IF  (WRDBEF(J.I)) 

348:  DO  300  K  =  1,  12 

349:  NEXTWD(K)  =  WORD(I.K) 

350:  300  CONTINUE 

351:  LAB  =  PTR(I) 

352:  DO  310  K  =  1.  12 

353:  WORD(I.K)  =  WORD(J.K) 

354:  310  CONTINUE 

355:  PTR(I)  =  PTR(J) 

356:  DO  320  K  =  1,  12 

357:  WORD(J.K)  =  NEXTWD(K) 

358:  320  CONTINUE 

359:  PTR(J)  =  LAB 

360:  ENDIF 

361:  410  CONTINUE 

362:  400  CONTINUE 
363:  C 

364:  RETURN 
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Figure  l.  Capabilities  of  the  testing  methods. 
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Figure  2.  Structure  of  goals/sub  goals/ questions  for  testing  experiment. 
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Figure  6.  Distribution  of  faults  In  the  programs. 
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Figure  7.  Fault  classification  and  manifestation. 
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Figure  9.  Overall  summary  of  detection  effectiveness  data. 

Note:  some  data  pertain  to  only  on-line  techniques  (*),  and 
some  data  were  collected  only  In  certain  phases. 
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Figure  11.  Overall  summary  for  number  of  faults  detected. 
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data  were  collected  only  In  certain  phases. 
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by  phase.  Key:  code  readers  CC),  functional  testers  (F>.  and  structural  testers  (SV 


S 

S 

S 


Phase  1 


ss 

87 

observat ions 

ss 

S 

ss 

s 

ss 

ss 

ss 

ss 

ss 

SF 

FSS 

SF 

FFS 

SFS 

FFS 

SFS 

FFSS 

FFS  S 

FFSS 

FFS  S 

FFFS 

SFFF  S 

CFFF 

SFCF  F 

CFFF 

SFCF  F  S 

OCFF  S 

SSFCFSF  S 

FOOCF  F 

SFCCCSFS  S 

COOCF  F  S 

FFCOCFFF  S 

OCOOOCFSSF 

SCFCCCFCC  S 

SCCOCCCFFOC  CCC  C 

CCCCCCFOOCF 

0  S 

10 

IS 

0 

Phase  3 

96  observations 


C  C 


C 
CC 
H — 
10 


IS 


s 

F 

F 

SF 

SF 

FF 

CC 

CC 


Phase  2 

39  observations 


SCCSFF 

FCCSCFS 

OOCSCFF 


0 


5 


10 


15 


Figure  14.  Overall  summary  for  fault  detection  rate  (#  faults 
detected  per  hour). 
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Figure  15.  Characterization  of  the  faults  detected. 
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Figure  18.  Characterization  of  faults  detected  by  the  three 

techniques:  10  omission  (O)  vs.  24  commission  (x).  The 
vertical  axis  Is  the  number  of  persons  using  the  particular 
technique  that  observed  the  fault. 
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Figure  17.  Characterization  of  faults  detected  by  the  three 

techniques:  Initialization  (2-A),  computation  (3-P),  con¬ 
trol  (7-C),  data  (3-D),  Interface  (13-1),  and  cosmetic  (l-S). 

The  vertical  axis  Is  the  number  of  persons  using  the  par¬ 
ticular  technique  that  observed  the  fault. 
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rate  than  did  functional  or  structural  testing,  while  functional  testing 
detected  more  faults  than  did  structural  testing,  but  functional  and  struc¬ 
tural  testing  were  not  different  in  fault  detection  rate.  2)  In  one  advanced 
student  subject  group,  code  reading  and  functional  testing  were  not  different 
in  faults  found,  but  were  both  superior  to  structural  testing,  while  in  the 
other  advanced  student  subject  group  there  was  no  difference  among  the  tech¬ 
niques.  3)  With  the  advanced  student  subjects,  the  three  techniques  were  not 
different  in  fault  detection  rate.  4)  Number  of  faults  observed,  fault 
detection  rate,  and  total  effort  in  detection  depended  on  the  type  of  software 
tested.  5)  Code  reading  detected  more  interface  faults  than  did  the  other 
methods.  6)  Functional  testing  detected  more  control  faults  than  did  the 
other  methods.  7)  When  asked  to  estimate  the  percentage  of  faults  detected, 
code  readers  gave  the  most  accurate  estimates  while  functional  testers  gave 
the  least  accurate  estimates. 
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